NodeMind compresses float32 RAG indexes 32× smaller with BGE-M3, 48× smaller vs HNSW, 96× smaller with BGE-base, and up to 128× on multimodal data — using a patent-pending integer-only binary codec, then searched at sub-1ms with pure-integer Hamming MIH. No GPU, no vector database, no cloud bills.
A 1 GB text document becomes a 10 GB RAG float32 index — that's the real cost of vector search at scale. NodeMind's binary codec crushes that 10 GB down to just 210 MB online (or 32× smaller offline). Same documents. Same BGE-M3 embeddings. Dramatically different storage.
Why does RAG expand 10×? Chunking 1 KB of text produces a 1024-dim float32 vector = 4 KB (4× on raw text). HNSW graph index structures add another 2–3×. Result: every 1 GB of documents becomes ~10 GB in a vector database — confirmed by Elasticsearch, Pure Storage, and Milvus benchmarks. NodeMind then compresses that 10 GB RAG index 48× further on text (up to 128× on multimodal data with NM-256) using our patent-pending binary codec.
| Original Documents | RAG Index float32 · ~10× expansion |
NodeMind Index binary · 48× smaller online |
vs RAG | RAG Storage/mo S3 Standard |
NodeMind Storage/mo S3 Standard |
Managed Vector DB/mo Pinecone pricing |
Annual Savings |
|---|---|---|---|---|---|---|---|
| — Storage Comparison | |||||||
| 1 GB documents ~250K chunks |
10 GB | 210 MB | 48× | $0.23/mo | $0.0024/mo | $25.00/mo | $300 / yr |
| 10 GB documents ~2.5M chunks |
100 GB | 2.1 GB | 48× | $2.30/mo | $0.024/mo | $250.00/mo | $3,000 / yr |
| 100 GB documents ~25M chunks |
1 TB | 21 GB | 48× | $23.00/mo | $0.24/mo | $2,500/mo | $30,000 / yr |
| 1 TB documents ~250M chunks |
10 TB | 210 GB | 48× | $230/mo | $2.40/mo | $25,000/mo | $300,000 / yr |
| — Search Performance | |||||||
| Search method Same 1024-dim BGE-M3 |
Cosine similarity on float32 — O(N·D) multiply-accumulate | <1ms | Hamming distance on 1024-bit integers — POPCNT only | ||||
| GPU required | Yes — needed for fast cosine at scale | No — pure CPU, any machine | |||||
| RAM for 250M chunks | ~1 TB RAM | ~10 GB RAM | |||||
| Offline / portable | No — requires live vector DB connection | Yes — download zip, run anywhere, no cloud needed | |||||
Codec: NodeMind's compression is not standard binary quantization (which breaks down on out-of-distribution queries). Our patent-pending algorithm is integer-only, deterministic, and produces fingerprints with recall that beats fixed-threshold binary baselines on real BEIR queries. This achieves 32× compression with BGE-M3 (1024-bit), 48× vs HNSW (incl. ~50% graph overhead), 96× with BGE-base (256-bit), and up to 128× on multimodal data (NM-256). Costs use S3 Standard at $0.023/GB/mo vs Pinecone managed vector DB at $2.50/GB/mo.
Note on benchmarks. Compression ratios are mathematical and verifiable with os.path.getsize() on the downloadable indexes — see the interactive benchmark page. Sub-1ms search latency holds at small/medium N; sub-linear scaling to ~12ms is documented in the patent for 100M-chunk indexes. On real out-of-distribution BEIR queries, NodeMind beats standard FAISS Fixed Binary on 3 of 4 datasets at the same compression, and stays within ~5pp of float32 cosine — the 32× / 48× / 96× / 128× compression numbers are the trade you make for that gap.
The NodeMind codec is modality-agnostic — text, images, audio, tables, and code share the same patent-pending binary encoding. Every modality below is measured in the multimodal benchmark.
All modality ratios are measured on real files — images are real Unsplash photos, audio is real ESC-10 environmental WAV files, tables and code are real structured data. See the multimodal benchmark for methodology, queries, and download links.
Three stages — embedding, binary encoding with our proprietary codec, and Multi-Index Hashing search. No gradients. No GPU. Pure integer arithmetic throughout.
Each float32 embedding is converted to a compact binary fingerprint using our patent-pending integer-only algorithm. This is not standard binary quantization (fixed-zero or per-vector mean) — our codec preserves semantic neighbourhood structure far better, so we beat fixed-threshold binary baselines on real BEIR queries. The full method is a trade secret protected under AU 2026904283.
The 1024-bit fingerprint is split into 64 sub-strings of 16 bits. Each sub-string indexes into a hash table; at query time, exact matches plus radius-1 Hamming variants per sub-table are merged into a candidate set, then re-ranked by full Hamming distance. Sub-linear exact nearest-neighbour search — no approximate structures.
NodeMind uses BGE-M3, the state-of-the-art multilingual embedding model with 1024 dimensions. Dense, sparse, and multi-vector representations are supported. The model is loaded once per worker — no repeated downloads.
After indexing, users download two zip files: the NodeMind binary index and a standard RAG float32 index. Both run completely offline using the included nodemind_local.py runner. No cloud subscription needed to query.
User uploads PDF │ ▼ [ FastAPI — nodemind.space ] ← nginx + SSL (Google Cloud VPS, 1TB) │ ▼ submit job [ Community Hardware: RTX 3080 + 128 GB RAM ] 1. pdfplumber → chunks 2. BGE-M3 → float32 embeddings (1024-dim) 3. Patent-pending binary codec → 1024-bit fingerprints (32× smaller; up to 128× at 256-bit) 4. MIH index: 64 sub-tables × 16-bit keys 5. RAG index: float32 cosine (comparison baseline) 6. Return nm_zip + rag_zip │ ▼ [ VPS stores zips ] ← auto-deleted after 24 hours │ ▼ User downloads both — runs offline
NodeMind's core algorithm is protected by an Australian provisional patent held by Sai Kiran Bathula, independent researcher, Coleambally NSW.
No installation. No API key. Upload any PDF, TXT, or Markdown file at the live demo and get a portable binary index back in under 2 minutes.
NodeMind is built by a solo independent researcher. Reach out for licensing, enterprise integration, or research collaboration.
saikiranbathula1@gmail.com