Valori – 我从零开始构建的原生Python向量数据库
我正在进行一个名为 Valori 的项目,这是一个从零开始构建的原生 Python 向量数据库——不是通过重新发明每个算法,而是将高效、知名的索引和搜索技术组合成一个统一的、可修改的框架。
这个想法源于我对现有向量数据库的失望,它们要么过于复杂,不适合实验,要么不够透明,难以修改。我想要一个简单、模块化且可扩展的解决方案——于是我自己动手构建了它。
它的功能包括:
- 允许您存储、索引和搜索高维向量
- 支持多种索引(Flat、HNSW、IVF、LSH、Annoy)
- 具有内存、磁盘和混合存储后端
- 包含完整的文档处理管道(解析、清理、分块、嵌入)
- 提供量化、持久化和基于插件的扩展性
所有功能均使用 Python 编写,与 NumPy 集成,并经过生产环境测试,内置日志记录和监控。
安装方法:
```
pip install valori
```
GitHub: [https://github.com/varshith-Git/valori](https://github.com/varshith-Git/valori)
PyPI: [https://pypi.org/project/valori](https://pypi.org/project/valori)
我很想听听您的想法——
在当前的向量数据库中,您觉得缺少什么?
如果您构建过 LLM 或 RAG 系统,您希望像这样的轻量级纯 Python 数据库在哪些方面做得更好?
您更喜欢更紧密的集成(如 LangChain、Haystack 等),还是更倾向于“自己动手”的风格?
欢迎任何反馈、批评或合作想法。
—— Varshith
(varshith.gudur17@gmail.com)
查看原文
I’ve been working on a project called Valori, a Python-native vector database I built from the ground up — not by reinventing every algorithm, but by wiring together efficient, well-known indexing and search techniques into a cohesive, hackable framework.<p>The idea came from my frustration with existing vector DBs that were either too heavy for experimentation or too opaque to modify. I wanted something simple, modular, and extensible — so I built it.<p>What it does:<p>Lets you store, index, and search high-dimensional vectors<p>Supports multiple indices (Flat, HNSW, IVF, LSH, Annoy)<p>Has memory, disk, and hybrid storage backends<p>Includes a full document processing pipeline (parsing, cleaning, chunking, embedding)<p>Offers quantization, persistence, and plugin-based extensibility<p>All written in Python, integrated with NumPy, and production-tested with logging and monitoring built in.<p>Install:<p>pip install valori<p>GitHub: https://github.com/varshith-Git/valori<p>PyPI: https://pypi.org/project/valori<p>I’d love to hear your thoughts —<p>What’s missing for you in current vector DBs?<p>If you’ve built LLM or RAG systems, what do you wish a lightweight, pure Python DB like this handled better?<p>Would you prefer tighter integrations (LangChain, Haystack, etc.) or a more “build-it-yourself” style?<p>Feedback, criticism, or collaboration ideas are all welcome.
— Varshith
(varshith.gudur17@gmail.com
)