HackerNews中文版

bzfs 是一个简单、可靠的命令行工具，用于在本地或通过 SSH 复制 ZFS 快照（zfs send/receive）。它的伴侣工具 bzfs_jobrunner 可以将这一功能转化为跨 N 个源主机和 M 个目标主机的定期快照/复制/修剪作业，由一个版本化的作业配置驱动。此次发布使得 1 秒的复制频率对于小增量变得实用，甚至在受限环境下（低 RTT、少量数据集、守护进程模式）也可以实现亚秒级的频率。 v1.13.0 主要关注于降低每次迭代的延迟——这是在大规模复制中高频率复制的敌人： - 在数据集之间和启动时重用 SSH：减少握手和往返次数，这样小增量发送可以节省大量时间。 - 更早的流启动：并行估算“待发送字节”，以便数据路径可以更早打开，而不是在预检时阻塞。 - 更智能的缓存：更快的快照列表哈希和更短的缓存路径，以减少在紧密循环中重复的 ZFS 查询。 - 更具弹性的连接：在失败之前短暂重试 SSH 控制路径，以平滑过渡瞬态波动。 - 更清晰的操作：标准化退出代码；当用户终止管道时，抑制“管道破裂”的噪声。为什么这很重要： - 在 1 秒的频率下，固定成本（会话设置、快照枚举）占主导地位。减少 RTT 和冗余的 `zfs list` 调用比原始吞吐量带来更大的收益。 - 对于大规模系统，尾部延迟很重要：减少每个作业的抖动和启动开销，在 N×M 个作业的情况下改善端到端的新鲜度。 1 秒（及亚秒级）复制： - 使用守护进程模式来避免每个进程的启动成本；保持进程处于活跃状态，并循环使用 `--daemon-replication-frequency`（例如，`1s`，在受限情况下甚至可以是 `100ms`）。 - 重用 SSH 连接（现在为默认设置），即使对于新进程也避免握手。 - 保持每个数据集的快照计数较低，并积极修剪；较少的条目使得 `zfs list -t snapshot` 更快。 - 限制范围，仅针对真正需要该频率的数据集（如 `--exclude-dataset`、`--skip-parent` 等过滤器）。 - 在大规模系统中，添加小的抖动以避免“雷鸣般的群体”，并限制工作进程以匹配 CPU、I/O 和链路 RTT。工作原理（简要）： - 从最新的公共快照进行增量发送；支持书签以确保安全和减少状态。 - 持久的 SSH 会话在数据集/zpool 之间和运行之间重用，以避免握手/执行开销。 - 快照枚举使用缓存，以避免在没有变化时重新扫描。 - 通过 bzfs_jobrunner 进行作业编排：相同的配置文件在所有主机上运行；添加抖动以避免“雷鸣般的群体”；设置工作进程计数/超时以实现规模化。高频率提示： - 以与快照创建成比例的频率进行修剪，以保持枚举速度。 - 使用守护进程模式；将快照/复制/修剪分成专用循环。 - 在主机之间添加小的随机启动抖动，以减少跨集群争用。 - 根据您的 I/O 和 RTT 范围调整 jobrunner 的 `--workers` 和每个工作进程的超时。快速示例： - 本地复制：`bzfs pool/src/ds pool/backup/ds` - 从远程拉取：`bzfs user@host:pool/src/ds pool/backup/ds` - Jobrunner（定期）：以守护进程模式运行共享的作业配置以实现 1 秒频率：`... --replicate --daemon-replication-frequency 1s`（在受限环境下，亚秒级如 `100ms` 是可能的）。为 `--create-src-snapshots`、`--replicate` 和 `--prune-` 使用单独的守护进程。链接： - 代码和文档：https://github.com/whoschek/bzfs - README：快速入门、过滤器、安全标志、示例 - Jobrunner README：多主机编排、抖动、守护进程模式、频率 - 1.13.0 差异：https://github.com/whoschek/bzfs/compare/v1.12.0...v1.13.0 注意事项： - 仅限标准工具（ZFS/Unix 和 Python）；没有额外的运行时依赖。我希望能收到在多个数据集/主机上运行 1 秒或亚秒复制的用户的性能反馈： - 每次迭代的墙时间、增量快照的数量/大小、数据集计数和链路 RTT 有助于结果的上下文化。欢迎提问！

查看原文

bzfs is a simple, reliable CLI for replicating ZFS snapshots (zfs send/receive) locally or over SSH. Its companion, bzfs_jobrunner, turns that into periodic snapshot/replication/pruning jobs across N source hosts and M destination hosts, driven by one versioned job config.This release makes 1‑second replication frequency practical for small incrementals, and even sub‑second frequency possible in constrained setups (low RTT, few datasets, daemon mode).v1.13.0 focuses on cutting per‑iteration latency — the enemy of high‑frequency replication at fleet scale:- SSH reuse across datasets and on startup: fewer handshakes and fewer round‑trips, which is where small incremental sends spend much of their time. - Earlier stream start: estimate "bytes to send" in parallel so the data path can open sooner instead of blocking on preflight. - Smarter caching: faster snapshot list hashing and shorter cache paths to reduce repeated ZFS queries in tight loops. - More resilient connects: retry the SSH control path briefly before failing to smooth over transient blips. - Cleaner ops: normalized exit codes; suppress “Broken pipe” noise when a user kills a pipeline.Why this matters - At 1s cadence, fixed costs (session setup, snapshot enumeration) dominate. Shaving RTTs and redundant `zfs list` calls yields bigger wins than raw throughput. - For fleets, the tail matters: reducing per‑job jitter and startup overhead improves end‑to‑end staleness when multiplied by N×M jobs.1‑second (and sub‑second) replication - Use daemon mode to avoid per‑process startup costs; keep the process hot and loop at `--daemon-replication-frequency` (e.g., `1s`, even `100ms` for constrained cases). - Reuse SSH connections (now default) to avoid handshakes even for new processes. - Keep per‑dataset snapshot counts low and prune aggressively; fewer entries make `zfs list -t snapshot` faster. - Limit scope to only datasets that truly need the cadence (filters like `--exclude-dataset`, `--skip-parent`). - In fleets, add small jitter to avoid thundering herds, and cap workers to match CPU, I/O, and link RTT.How it works (nutshell) - Incremental sends from the latest common snapshot; bookmarks supported for safety and reduced state. - Persistent SSH sessions are reused across datasets/zpools and across runs to avoid handshake/exec overhead. - Snapshot enumeration uses a cache to avoid re‑scanning when nothing changed. - Job orchestration via bzfs_jobrunner: same config file runs on all hosts; add jitter to avoid thundering herds; set worker counts/timeouts for scale.High‑frequency tips - Prune at a frequency proportional to snapshot creation to keep enumerations fast. - Use daemon mode; split snapshot/replicate/prune into dedicated loops. - Add small random start jitter across hosts to reduce cross‑fleet contention. - Tune jobrunner `--workers` and per‑worker timeouts for your I/O and RTT envelope.Quick examples - Local replicate: `bzfs pool/src/ds pool/backup/ds` - Pull from remote: `bzfs user@host:pool/src/ds pool/backup/ds` - Jobrunner (periodic): run the shared jobconfig with daemon mode for 1s cadence: `... --replicate --daemon-replication-frequency 1s` (sub‑second like `100ms` is possible in constrained setups). Use separate daemons for `--create-src-snapshots`, `--replicate`, and `--prune-`.Links - Code and docs: https://github.com/whoschek/bzfs - README: quickstart, filters, safety flags, examples - Jobrunner README: multi‑host orchestration, jitter, daemon mode, frequencies - 1.13.0 diff: https://github.com/whoschek/bzfs/compare/v1.12.0...v1.13.0Notes - Standard tooling only (ZFS/Unix and Python); no extra runtime deps.I’d love performance feedback from folks running 1s or sub‑second replication across multiple datasets/hosts: - per‑iteration wall time, number/size of incremental snapshots, dataset counts, and link RTTs help contextualize results.Happy to answer questions!

Bzfs 1.13.0 – 跨集群实现1秒（甚至亚秒级）ZFS复制