AU Patent Pending · 2026

Document search
at binary speed

NodeMind compresses float32 RAG indexes 32× smaller with BGE-M3, 48× smaller vs HNSW, 96× smaller with BGE-base, and up to 128× on multimodal data — using a patent-pending integer-only binary codec, then searched at sub-1ms with pure-integer Hamming MIH. No GPU, no vector database, no cloud bills.

32×
Smaller than float32 RAG
128×
Max compression (multimodal)
<1ms
Hamming search (CPU)
0
GPU required
$0
Vector DB bills

RAG float32 vs NodeMind Binary

A 1 GB text document becomes a 10 GB RAG float32 index — that's the real cost of vector search at scale. NodeMind's binary codec crushes that 10 GB down to just 210 MB online (or 32× smaller offline). Same documents. Same BGE-M3 embeddings. Dramatically different storage.

Why does RAG expand 10×? Chunking 1 KB of text produces a 1024-dim float32 vector = 4 KB (4× on raw text). HNSW graph index structures add another 2–3×. Result: every 1 GB of documents becomes ~10 GB in a vector database — confirmed by Elasticsearch, Pure Storage, and Milvus benchmarks. NodeMind then compresses that 10 GB RAG index 48× further on text (up to 128× on multimodal data with NM-256) using our patent-pending binary codec.

Original Documents RAG Index
float32 · ~10× expansion
NodeMind Index
binary · 48× smaller online
vs RAG RAG Storage/mo
S3 Standard
NodeMind Storage/mo
S3 Standard
Managed Vector DB/mo
Pinecone pricing
Annual Savings
— Storage Comparison
1 GB documents
~250K chunks
10 GB 210 MB 48× $0.23/mo $0.0024/mo $25.00/mo $300 / yr
10 GB documents
~2.5M chunks
100 GB 2.1 GB 48× $2.30/mo $0.024/mo $250.00/mo $3,000 / yr
100 GB documents
~25M chunks
1 TB 21 GB 48× $23.00/mo $0.24/mo $2,500/mo $30,000 / yr
1 TB documents
~250M chunks
10 TB 210 GB 48× $230/mo $2.40/mo $25,000/mo $300,000 / yr
— Search Performance
Search method
Same 1024-dim BGE-M3
Cosine similarity on float32 — O(N·D) multiply-accumulate <1ms Hamming distance on 1024-bit integers — POPCNT only
GPU required Yes — needed for fast cosine at scale No — pure CPU, any machine
RAM for 250M chunks ~1 TB RAM ~10 GB RAM
Offline / portable No — requires live vector DB connection Yes — download zip, run anywhere, no cloud needed

Codec: NodeMind's compression is not standard binary quantization (which breaks down on out-of-distribution queries). Our patent-pending algorithm is integer-only, deterministic, and produces fingerprints with recall that beats fixed-threshold binary baselines on real BEIR queries. This achieves 32× compression with BGE-M3 (1024-bit), 48× vs HNSW (incl. ~50% graph overhead), 96× with BGE-base (256-bit), and up to 128× on multimodal data (NM-256). Costs use S3 Standard at $0.023/GB/mo vs Pinecone managed vector DB at $2.50/GB/mo.

Note on benchmarks. Compression ratios are mathematical and verifiable with os.path.getsize() on the downloadable indexes — see the interactive benchmark page. Sub-1ms search latency holds at small/medium N; sub-linear scaling to ~12ms is documented in the patent for 100M-chunk indexes. On real out-of-distribution BEIR queries, NodeMind beats standard FAISS Fixed Binary on 3 of 4 datasets at the same compression, and stays within ~5pp of float32 cosine — the 32× / 48× / 96× / 128× compression numbers are the trade you make for that gap.

Built for every data type

The NodeMind codec is modality-agnostic — text, images, audio, tables, and code share the same patent-pending binary encoding. Every modality below is measured in the multimodal benchmark.

Text & Documents
32×–96×
PDF, TXT, Markdown. BGE-M3 → 1024-bit (32×); BGE-base + PCA-256 → 256-bit (96×). Recall@5 ≥ 0.999 on 500K-chunk Wikipedia/arXiv/Gutenberg corpus. Live on nodemind.space.
Images
32×–128×
Real Unsplash photos embedded with BGE-Visualized-M3. NM-1024 (32×) / NM-512 (64×) / NM-256 (128×) all hit Recall@1 = 1.000 vs Gemini RAG float32 baseline.
Audio
32×–128×
ESC-10 environmental clips (CC-BY 4.0) routed through Whisper transcription, then binarised with the same codec. Recall@1 = 1.000 across all three compression levels.
Tables & Code
32×–128×
Structured CSV / SQL / OHLCV tables and Python / SQL / Bash code. Same codec, same compression sweep, Recall@1 = 1.000. Video pipeline (transcript + frame embeddings) coming next.

All modality ratios are measured on real files — images are real Unsplash photos, audio is real ESC-10 environmental WAV files, tables and code are real structured data. See the multimodal benchmark for methodology, queries, and download links.

How NodeMind works

Three stages — embedding, binary encoding with our proprietary codec, and Multi-Index Hashing search. No gradients. No GPU. Pure integer arithmetic throughout.

1. Chunk document
2. BGE-M3 embed (1024-dim)
3. NodeMind binary codec
4. MIH index (64 sub-tables)
5. Hamming search → results

Patent-pending Binary Codec

Each float32 embedding is converted to a compact binary fingerprint using our patent-pending integer-only algorithm. This is not standard binary quantization (fixed-zero or per-vector mean) — our codec preserves semantic neighbourhood structure far better, so we beat fixed-threshold binary baselines on real BEIR queries. The full method is a trade secret protected under AU 2026904283.

  • 32× / 48× / 96× / 128× depending on encoder + bit width
  • Integer-only — no floats at indexing
  • Deterministic, single portable file
  • Patent AU 2026904283 (codec + index)

Multi-Index Hashing (MIH)

The 1024-bit fingerprint is split into 64 sub-strings of 16 bits. Each sub-string indexes into a hash table; at query time, exact matches plus radius-1 Hamming variants per sub-table are merged into a candidate set, then re-ranked by full Hamming distance. Sub-linear exact nearest-neighbour search — no approximate structures.

  • 64 hash sub-tables, 16 bits each
  • Sub-1ms query at small/medium N; ~12ms at 100M
  • Pure XOR + POPCNT — no FAISS, no HNSW, no ANN library
  • Patent AU 2026904283 (codec + index)

BGE-M3 Embeddings

NodeMind uses BGE-M3, the state-of-the-art multilingual embedding model with 1024 dimensions. Dense, sparse, and multi-vector representations are supported. The model is loaded once per worker — no repeated downloads.

  • MTEB top-ranked multilingual model
  • 1024-dim dense vectors
  • Runs on community hardware (RTX 3080 + 128 GB RAM) — no datacenter required
  • Scales to zero when idle

Portable Index Files

After indexing, users download two zip files: the NodeMind binary index and a standard RAG float32 index. Both run completely offline using the included nodemind_local.py runner. No cloud subscription needed to query.

  • NodeMind zip: binary MIH index
  • RAG zip: float32 cosine index
  • Side-by-side benchmark built in
  • Auto-deleted after 24 hours
User uploads PDF
        │
        ▼
[ FastAPI — nodemind.space ]   ← nginx + SSL (Google Cloud VPS, 1TB)
        │
        ▼  submit job
[ Community Hardware: RTX 3080 + 128 GB RAM ]
  1. pdfplumber → chunks
  2. BGE-M3 → float32 embeddings (1024-dim)
  3. Patent-pending binary codec → 1024-bit fingerprints (32× smaller; up to 128× at 256-bit)
  4. MIH index: 64 sub-tables × 16-bit keys
  5. RAG index: float32 cosine (comparison baseline)
  6. Return nm_zip + rag_zip
        │
        ▼
[ VPS stores zips ]   ← auto-deleted after 24 hours
        │
        ▼
User downloads both — runs offline

Patent-protected technology

NodeMind's core algorithm is protected by an Australian provisional patent held by Sai Kiran Bathula, independent researcher, Coleambally NSW.

AU 2026904283 · Provisional
NodeMind Codec & Index
The proprietary integer-only encoding that converts float32 embeddings into compact binary fingerprints, plus the portable single-file binary fingerprint index format used at query time. Achieves 32× / 96× / 128× compression depending on encoder, with recall that beats fixed-threshold binary quantization on out-of-distribution BEIR queries. Sub-linear exact Hamming nearest-neighbour search on a single CPU core. Full method is a trade secret.

HiveMind — funded by NodeMind

We did the hard math on document compression. NodeMind is shipped. The next big bet is HiveMind — a public AI reasoning network where humans and agents leave compressed reasoning traces, register watches on ideas, surface contradictions, and connect tools through shared memory. Funded by NodeMind revenue.

See the HiveMind concept →   Follow @QLNI_AI for updates

Try NodeMind for free

No installation. No API key. Upload any PDF, TXT, or Markdown file at the live demo and get a portable binary index back in under 2 minutes.

01
Visit the demo
Go to nodemind.space and click Try Free. Enter your email — login is instant, no inbox check.
02
Upload a document
Drop any PDF, TXT, or Markdown file (10 MB per file, 50 MB lifetime per account). Community hardware (RTX 3080 + 128 GB RAM) picks it up, embeds with BGE-M3, and applies the NodeMind codec — typical 5,500-page PDF indexes in ~7 minutes.
03
Download your index
Once ready, download both the NodeMind binary index and the RAG float32 index. Run queries side-by-side offline.
04
Compare live
Use the Compare tab to run queries and see NodeMind vs RAG side by side — latency, index size, compression ratio, and result quality.
Open Live Demo →

Interested in licensing or collaboration?

NodeMind is built by a solo independent researcher. Reach out for licensing, enterprise integration, or research collaboration.

saikiranbathula1@gmail.com
Sai Kiran Bathula · Coleambally, NSW, Australia · Independent Researcher