展示HN:CloudSlash – 查找AWS资源浪费并生成Terraform状态删除命令

1作者: drskyle大约 1 个月前原帖
我们都经历过这样的情况:你发现一个未使用的 NAT 网关,每月花费 45 美元。你在 AWS 控制台中将其删除,以立即停止计费。但下次运行 `terraform plan` 时,它失败了,因为状态漂移。现在你必须手动运行 `terraform state rm` 或将其导入回来以修复漂移。这很繁琐,所以我们常常选择让这些浪费的资源继续运行。 我开发了 CloudSlash 来自动化清理和状态修复。它是用 Go 语言编写的(使用 BubbleTea 进行 TUI),解决了两个工程问题: 1. 查找“空洞”资源(图形)。大多数成本工具只检查 CloudWatch 指标(CPU < 5%),这会产生过多的噪音。相反,我构建了一个内存中的基础设施图,以发现结构性浪费。例如,一个“活动”的 ELB。它有健康的目标,因此指标看起来不错。但如果你遍历图形(ELB -> 实例 -> 子网 -> 路由表),你可能会发现路由表没有通往互联网网关的路径。即使 AWS 报告它是“健康的”,这个 ELB 在功能上已经无效。 2. 状态映射。在 AWS 中删除资源很简单。挑战在于将物理 ID(例如,nat-0a1b2c)映射回其 Terraform 地址(例如,module.vpc.aws_nat_gateway.public[0]),以便你可以以编程方式将其从状态文件中删除。我编写了一个解析器,读取你的本地 .tfstate,处理复杂的 JSON 结构(包括嵌套模块和 for_each 输出),并生成修复脚本。 它输出一个 shell 脚本(fix_terraform.sh),为你运行必要的 `terraform state rm` 命令。它不会直接写入你的 .tf 文件——它只是将脚本交给你进行审查和运行。 核心逻辑、扫描器和 TUI 是开源的(AGPLv3)。我对自动生成修复脚本的功能收取一次性许可费,但法医分析/检测是免费的。 仓库地址:<a href="https://github.com/DrSkyle/CloudSlash" rel="nofollow">https://github.com/DrSkyle/CloudSlash</a>
查看原文
We&#x27;ve all been there: You find an unused NAT Gateway costing $45&#x2F;mo. You delete it in the AWS console to stop the billing immediately. But the next time you run terraform plan, it fails because of state drift. Now you have to manually run terraform state rm or import it back to fix the drift. It&#x27;s tedious, so often we just leave the waste running. I built CloudSlash to automate the cleanup and the state surgery. It’s written in Go (using BubbleTea for the TUI) and solves two engineering problems: 1. Finding &quot;hollow&quot; resources (the graph). Most cost tools just check CloudWatch metrics (CPU &lt; 5%). That creates too much noise. Instead, I build an in-memory graph of the infrastructure to find structural waste. Example: An &quot;Active&quot; ELB. It has healthy targets, so metrics look good. But if you traverse the graph (ELB -&gt; Instance -&gt; Subnet -&gt; Route Table), you might see the Route Table has no path to an Internet Gateway. The ELB is functionally dead, even if AWS reports it as &quot;healthy.&quot; 2. The state mapping. Deleting the resource in AWS is easy. The challenge is mapping a physical ID (e.g., nat-0a1b2c) back to its Terraform address (e.g., module.vpc.aws_nat_gateway.public[0]) so you can remove it from the state file programmatically. I wrote a parser that reads your local .tfstate, handles the complex JSON structure (including nested modules and for_each outputs), and generates a remediation script. It outputs a shell script (fix_terraform.sh) that runs the necessary terraform state rm commands for you. It never writes to your .tf files directly—it just hands you the script to review and run. The core logic, scanner, and TUI are open source (AGPLv3). I charge a one-time license for the feature that auto-generates the fix scripts for developers , but the forensic analysis&#x2F;detection is free. Repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;DrSkyle&#x2F;CloudSlash" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DrSkyle&#x2F;CloudSlash</a>