Valori – 我从零开始构建的原生Python向量数据库

2作者: varshith1711 天前原帖
我正在进行一个名为 Valori 的项目,这是一个从零开始构建的原生 Python 向量数据库——不是通过重新发明每个算法,而是将高效、知名的索引和搜索技术组合成一个统一的、可修改的框架。 这个想法源于我对现有向量数据库的失望,它们要么过于复杂,不适合实验,要么不够透明,难以修改。我想要一个简单、模块化且可扩展的解决方案——于是我自己动手构建了它。 它的功能包括: - 允许您存储、索引和搜索高维向量 - 支持多种索引(Flat、HNSW、IVF、LSH、Annoy) - 具有内存、磁盘和混合存储后端 - 包含完整的文档处理管道(解析、清理、分块、嵌入) - 提供量化、持久化和基于插件的扩展性 所有功能均使用 Python 编写,与 NumPy 集成,并经过生产环境测试,内置日志记录和监控。 安装方法: ``` pip install valori ``` GitHub: [https://github.com/varshith-Git/valori](https://github.com/varshith-Git/valori) PyPI: [https://pypi.org/project/valori](https://pypi.org/project/valori) 我很想听听您的想法—— 在当前的向量数据库中,您觉得缺少什么? 如果您构建过 LLM 或 RAG 系统,您希望像这样的轻量级纯 Python 数据库在哪些方面做得更好? 您更喜欢更紧密的集成(如 LangChain、Haystack 等),还是更倾向于“自己动手”的风格? 欢迎任何反馈、批评或合作想法。 —— Varshith (varshith.gudur17@gmail.com)
查看原文
I’ve been working on a project called Valori, a Python-native vector database I built from the ground up — not by reinventing every algorithm, but by wiring together efficient, well-known indexing and search techniques into a cohesive, hackable framework.<p>The idea came from my frustration with existing vector DBs that were either too heavy for experimentation or too opaque to modify. I wanted something simple, modular, and extensible — so I built it.<p>What it does:<p>Lets you store, index, and search high-dimensional vectors<p>Supports multiple indices (Flat, HNSW, IVF, LSH, Annoy)<p>Has memory, disk, and hybrid storage backends<p>Includes a full document processing pipeline (parsing, cleaning, chunking, embedding)<p>Offers quantization, persistence, and plugin-based extensibility<p>All written in Python, integrated with NumPy, and production-tested with logging and monitoring built in.<p>Install:<p>pip install valori<p>GitHub: https:&#x2F;&#x2F;github.com&#x2F;varshith-Git&#x2F;valori<p>PyPI: https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;valori<p>I’d love to hear your thoughts —<p>What’s missing for you in current vector DBs?<p>If you’ve built LLM or RAG systems, what do you wish a lightweight, pure Python DB like this handled better?<p>Would you prefer tighter integrations (LangChain, Haystack, etc.) or a more “build-it-yourself” style?<p>Feedback, criticism, or collaboration ideas are all welcome. — Varshith (varshith.gudur17@gmail.com )