HackerNews中文版

嗨，HN， KafScale 是一个与 Kafka 兼容的流处理系统，原生支持 Kubernetes，S3 是数据的真实来源，代理不持有持久状态。它是用 Go 语言编写的，运行在 Kubernetes 上。在多年操作 Kafka 并遇到相同问题后，我构建了这个系统：代理故障需要数小时才能恢复，分区再平衡会阻塞部署，磁盘容量规划永无止境。它的工作原理： - 生产者和消费者使用标准的 Kafka 客户端 - 代理在内存中进行缓冲，并将数据刷新到 S3 - etcd 存储元数据和消费者组状态 - 恢复意味着重启一个 Pod 并从 S3 读取数据 - 可选的 Iceberg 处理器直接从 S3 读取数据段，完全绕过代理，适用于批处理/分析工作负载你需要放弃的：延迟为 400-500 毫秒（S3 往返时间），不支持事务，没有压缩主题。这并不是一个 100% 的替代方案。你所获得的：代理是一次性的，扩展仅需增加副本数量，无需磁盘管理，可以直接通过 S3 ACL 访问流数据。许可证：Apache 2.0 GitHub: [https://github.com/novatechflow/kafscale](https://github.com/novatechflow/kafscale)

查看原文

Hi HN,KafScale is Kafka-compatible streaming, k8s native, where S3 is the source of truth and brokers hold no persistent state. Written in Go, runs on Kubernetes.Built this after years of operating Kafka and hitting the same walls: broker failures that take hours to recover, partition rebalancing that blocks deploys, disk capacity planning that never ends.How it works:- Producers and consumers use standard Kafka clients - Brokers buffer in memory, flush to S3 - etcd stores metadata and consumer group state - Recovery means restarting a pod and reading from S3 - Optional Iceberg processor reads segments directly from S3, bypasses brokers entirely for batch/analytical workloadsWhat you give up: latency is 400-500ms (S3 round-trip), no transactions, no compacted topics. It's not a 100% replacement.What you get: brokers are disposable, scaling is just replica count, no disk management, direct access to streamed data over S3 ACLLicense: Apache 2.0 GitHub: <a href="https://github.com/novatechflow/kafscale" rel="nofollow">https://github.com/novatechflow/kafscale</a>

展示HN：在S3上实现与Kafka兼容的无状态代理流处理