展示HN:Goxe – 在i5上快速日志聚类(减少到每个日志1次分配,朝着0的目标前进)
我刚刚发布了 Goxe v1.4.0。在上次更新后,我一直在努力挖掘 Go 的每一丝性能。
<p>重大更新:离线归一化 (-brew)
我新增了一种模式,用于处理和归一化在安装 Goxe 之前存储在磁盘上的旧日志。它将相似的消息聚类,减少存储占用,并将指标发送到远程服务器。
<p>工程胜利:
<pre><code> 内存分配:我成功将每条日志行的开销从 2 次分配减少到仅 1 次,使用了不安全的零拷贝字符串转换和 bufio.Scanner 优化。
目标:我目前正在重构核心管道,力争在下一个周期内实现 0 次分配。
性能:在一台老旧的 i5-8250U @ 3.40 GHz 上,仍然能达到每秒 19,000 条日志的处理速度,并且内存占用极小。
</code></pre>
为什么要使用它?如果你的磁盘上堆满了大量日志文件,-brew 将把它们归一化为结构化摘要([名称]_[日期]_normalized.log),节省空间,并提供清晰的统计信息(计数、首次/最后出现),而不会占用你的 CPU 资源。
<p>我很想听听你对零拷贝方法的看法。
<p>代码库: <a href="https://github.com/DumbNoxx/goxe" rel="nofollow">https://github.com/DumbNoxx/goxe</a>
查看原文
I’ve just released Goxe v1.4.0. After the last update, I’ve been obsessed with squeezing every bit of performance out of Go.<p>The Big Update: Offline Normalization (-brew)
I added a new mode to process and normalize legacy logs that were sitting on disk before Goxe was installed. It clusters similar messages, reduces storage footprint, and ships metrics to a remote server.<p>The Engineering Win:<p><pre><code> Allocations: I managed to reduce the overhead from 2 allocs/op to just 1 per log line using unsafe zero-copy string conversions and bufio.Scanner optimization.
The Goal: I’m currently refactoring the core pipeline to hit 0 allocations in the next cycle.
Performance: Still hitting 19k logs/s on an old i5-8250U @ 3.40 GHz with a minimal RAM footprint.
</code></pre>
Why use it? If you have massive log files cluttering your disk, -brew will normalize them into a structured summary ([name]_[date]_normalized.log), saving space and giving you clear stats (Count, First/Last seen) without killing your CPU.<p>I’d love to hear your thoughts on the zero-copy approach.<p>Repo: <a href="https://github.com/DumbNoxx/goxe" rel="nofollow">https://github.com/DumbNoxx/goxe</a>