展示HN:在S3上实现与Kafka兼容的无状态代理流处理
嗨,HN,
KafScale 是一个与 Kafka 兼容的流处理系统,原生支持 Kubernetes,S3 是数据的真实来源,代理不持有持久状态。它是用 Go 语言编写的,运行在 Kubernetes 上。
在多年操作 Kafka 并遇到相同问题后,我构建了这个系统:代理故障需要数小时才能恢复,分区再平衡会阻塞部署,磁盘容量规划永无止境。
它的工作原理:
- 生产者和消费者使用标准的 Kafka 客户端
- 代理在内存中进行缓冲,并将数据刷新到 S3
- etcd 存储元数据和消费者组状态
- 恢复意味着重启一个 Pod 并从 S3 读取数据
- 可选的 Iceberg 处理器直接从 S3 读取数据段,完全绕过代理,适用于批处理/分析工作负载
你需要放弃的:延迟为 400-500 毫秒(S3 往返时间),不支持事务,没有压缩主题。这并不是一个 100% 的替代方案。
你所获得的:代理是一次性的,扩展仅需增加副本数量,无需磁盘管理,可以直接通过 S3 ACL 访问流数据。
许可证:Apache 2.0
GitHub: [https://github.com/novatechflow/kafscale](https://github.com/novatechflow/kafscale)
查看原文
Hi HN,<p>KafScale is Kafka-compatible streaming, k8s native, where S3 is the source of truth and brokers hold no persistent state. Written in Go, runs on Kubernetes.<p>Built this after years of operating Kafka and hitting the same walls: broker failures that take hours to recover, partition rebalancing that blocks deploys, disk capacity planning that never ends.<p>How it works:<p>- Producers and consumers use standard Kafka clients
- Brokers buffer in memory, flush to S3
- etcd stores metadata and consumer group state
- Recovery means restarting a pod and reading from S3
- Optional Iceberg processor reads segments directly from S3, bypasses brokers entirely for batch/analytical workloads<p>What you give up: latency is 400-500ms (S3 round-trip), no transactions, no compacted topics. It's not a 100% replacement.<p>What you get: brokers are disposable, scaling is just replica count, no disk management, direct access to streamed data over S3 ACL<p>License: Apache 2.0
GitHub: <a href="https://github.com/novatechflow/kafscale" rel="nofollow">https://github.com/novatechflow/kafscale</a>