展示HN:Goxe – 在i5上快速日志聚类(减少到每个日志1次分配,朝着0的目标前进)

1作者: nxus_dev大约 1 个月前原帖
我刚刚发布了 Goxe v1.4.0。在上次更新后,我一直在努力挖掘 Go 的每一丝性能。 <p>重大更新:离线归一化 (-brew) 我新增了一种模式,用于处理和归一化在安装 Goxe 之前存储在磁盘上的旧日志。它将相似的消息聚类,减少存储占用,并将指标发送到远程服务器。 <p>工程胜利: <pre><code> 内存分配:我成功将每条日志行的开销从 2 次分配减少到仅 1 次,使用了不安全的零拷贝字符串转换和 bufio.Scanner 优化。 目标:我目前正在重构核心管道,力争在下一个周期内实现 0 次分配。 性能:在一台老旧的 i5-8250U @ 3.40 GHz 上,仍然能达到每秒 19,000 条日志的处理速度,并且内存占用极小。 </code></pre> 为什么要使用它?如果你的磁盘上堆满了大量日志文件,-brew 将把它们归一化为结构化摘要([名称]_[日期]_normalized.log),节省空间,并提供清晰的统计信息(计数、首次/最后出现),而不会占用你的 CPU 资源。 <p>我很想听听你对零拷贝方法的看法。 <p>代码库: <a href="https://github.com/DumbNoxx/goxe" rel="nofollow">https://github.com/DumbNoxx/goxe</a>
查看原文
I’ve just released Goxe v1.4.0. After the last update, I’ve been obsessed with squeezing every bit of performance out of Go.<p>The Big Update: Offline Normalization (-brew) I added a new mode to process and normalize legacy logs that were sitting on disk before Goxe was installed. It clusters similar messages, reduces storage footprint, and ships metrics to a remote server.<p>The Engineering Win:<p><pre><code> Allocations: I managed to reduce the overhead from 2 allocs&#x2F;op to just 1 per log line using unsafe zero-copy string conversions and bufio.Scanner optimization. The Goal: I’m currently refactoring the core pipeline to hit 0 allocations in the next cycle. Performance: Still hitting 19k logs&#x2F;s on an old i5-8250U @ 3.40 GHz with a minimal RAM footprint. </code></pre> Why use it? If you have massive log files cluttering your disk, -brew will normalize them into a structured summary ([name]_[date]_normalized.log), saving space and giving you clear stats (Count, First&#x2F;Last seen) without killing your CPU.<p>I’d love to hear your thoughts on the zero-copy approach.<p>Repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;DumbNoxx&#x2F;goxe" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DumbNoxx&#x2F;goxe</a>