展示HN:FalkorDB – 开源图数据库重大更新(C/Rust)
大家好,
我是Roi,FalkorDB的联合创始人之一。我们是一支正在成长的团队,致力于开发一个为生产工作负载和图形增强生成(GraphRAG)系统设计的图数据库。我们最新版本(v4.10.0)已发布,我想分享一些更新,并向关注性能和内存效率的朋友们征求反馈。
FalkorDB是一个开源的属性图数据库,支持OpenCypher(并带有我们自己的扩展),在需要准确性的检索增强生成设置中作为底层支持。
我们正在解决的一个重大问题是,在生产环境中扩展图数据库,而不导致内存膨胀或不可预测的性能。对于数组字段,索引支持通常是有限的。如果你想做一些基本的操作,比如将当前值与序列中的前一个值进行比较(想想时间序列建模),查询引擎往往会让你费尽周折。
我们在为RedisGraph工作多年后创立了FalkorDB(我们是原始作者)。与其修补旧的代码库,我们选择了用稀疏矩阵代数后端构建FalkorDB,以提高性能。我们的目标是构建一个能够承受压力的系统,比如在单个实例中处理超过10K个图形,并且仍然能够交互式地回答复杂查询。
为了更接近这个目标,我们在新版本中添加了以下改进:我们引入了字符串内存管理的新函数intern(),它可以在图形之间去重相同的字符串,这在推荐系统中非常有用,例如当你有数百万个“US”字符串时。我们还添加了一个命令(GRAPH.MEMORY USAGE),可以按节点、边、矩阵和索引(每个图形)细分内存消耗,这在你试图弄清楚你的堆是否因边的基数或索引开销而受到压制时非常有用。
索引功能也变得更智能了,数组现在可以以实际可用的方式进行原生索引(据我所知,Neo4j并没有原生支持这一点)。
在分析方面,我们添加了CDLP(通过标签传播进行社区检测)、WCC(弱连通分量)和介数中心性,这些都作为过程暴露出来。这些功能源于与欺诈检测和行为聚类团队的合作,在这些情况下,你不想提前猜测社区的数量。
如果你想尝试FalkorDB,我们建议通过Docker运行它。代码也可以在GitHub上找到 [https://github.com/FalkorDB/falkordb](https://github.com/FalkorDB/falkordb)。
文档可以在 [https://docs.falkordb.com](https://docs.falkordb.com) 找到。
我们很想听听任何正在构建图形密集型系统的人的反馈,特别是如果你在其他地方遇到内存或索引限制。我们正在全力以赴地开发,并始终在学习,感谢你们提供的任何反馈或测试案例。
查看原文
Hey HN,
I’m Roi, one of the co-creators of FalkorDB. We’re a growing team working on a graph database designed for production workloads and GraphRAG systems. The new release (v4.10.0) is out, and I wanted to share some of the updates and ask for feedback from folks who care about performance, memory efficiency in graph-heavy systems.<p>FalkorDB is an open-source property graph database that supports OpenCypher (with our own extensions) and is used under the hood for retrieval-augmented generation setups where accuracy matters.<p>The big problem we’re working on is scaling graph databases without memory bloat or unpredictable performance in prod. Support for Indexing tends to be limited with array fields. And if you want to do something basic like compare a current value to the previous one in a sequence (think time series modeling), the query engine often makes you jump through hoops.<p>We started FalkorDB after working for years on RedisGraph (we were the original authors). Rather than patch the old codebase, we built FalkorDB with a sparse matrix algebra backend for performance. Our goal was to build something that could hold up under pressure, like 10K+ graphs in a single instance, and still let you answer complex queries interactively.<p>To get closer to this goal, we’ve added the following improvements in this new version: We added string interning with a new intern() function. It lets you deduplicate identical strings across graphs, which is surprisingly useful in, for example, recommender systems where you have millions of “US” strings. We also added a command (GRAPH.MEMORY USAGE) that breaks down memory consumption by nodes, edges, matrices, and indices (per graph), which is useful when you’re trying to figure out if your heap is getting crushed by edge cardinality or indexing overhead.
Indexing got smarter too, with arrays now natively indexable in a way that’s actually usable in production (Neo4j doesn’t do this natively, last I checked).<p>On the analytics side, we added CDLP (community detection via label propagation), WCC (weakly connected components), and betweenness centrality, which are all exposed as procedures. These came out of working with teams in fraud detection and behavioral clustering where you don’t want to guess the number of communities in advance.<p>If you want to try FalkorDB, we recommend you run it via Docker
The code’s also available on GitHub <a href="https://github.com/FalkorDB/falkordb">https://github.com/FalkorDB/falkordb</a><p>Docs are at <a href="https://docs.falkordb.com" rel="nofollow">https://docs.falkordb.com</a>.<p>Curious to hear from anyone who’s building graph-heavy systems, especially if you’ve hit memory or indexing limits elsewhere. We’re heads-down building and always learning, grateful for any feedback or test cases you throw at us.