Real-World Benchmark

500,000 chunks · Wikipedia + arXiv + Project Gutenberg · NVIDIA A40 · Independently verifiable — download and run it yourself.

32×

vs float32 RAG

48×

vs HNSW index

96×

BGE-base 256-bit

1.000

Recall@10

Corpus & Methodology

Mixed real-world corpus across three domains — general knowledge, scientific literature, and long-form prose.

Source	Domain	Size	Description
Wikipedia (Simple English)	General knowledge	~100 MB	Encyclopedia articles
arXiv papers	Science / ML	~40 MB	CS & ML abstracts + intros
Project Gutenberg	Literature	~28 MB	Public domain books
Total	Mixed	~168 MB raw	642,939 paragraphs → 500,000 chunks

Chunking: 400 words / chunk, 50-word overlap. Embedding: BAAI/bge-m3 (1024-dim) on NVIDIA A40. Recall: 1,000 queries vs exact cosine top-k ground truth.

Self-retrieval protocol — queries are perturbed corpus chunks. Optimistic for binary methods. See Caveats below.

Retrieval Accuracy

BGE-M3 · 1024-bit binary fingerprints

Metric	NodeMind MIH	Ground Truth
Recall@1	0.999	1.000
Recall@3	0.999	1.000
Recall@5	1.000	1.000
Recall@10	1.000	1.000
Recall@20	1.000	1.000
MRR@10	0.9992	1.000

BGE-base · 768-bit and 256-bit (PCA)

Metric	768-bit	256-bit (PCA)
Recall@1	0.999	1.000
Recall@5	1.000	1.000
Recall@10	1.000	1.000
MRR@10	0.9995	1.000

Index Size — 500,000 Chunks

Index	Size	Bytes / chunk	vs float32
NodeMind BGE-M3 (1024-bit)	64 MB	128 B	32×
Float32 RAG — BGE-M3 (baseline)	2,048 MB	4,096 B	1× (reference)
HNSW index (float32 × 1.5× overhead)	3,072 MB	6,144 B	48× vs NM
NodeMind BGE-base 256-bit (PCA)	16 MB	32 B	96×

Index only — document text stored separately and equally in all systems.

Download — Verify It Yourself

All files generated from the same 500,000 chunks. Download NodeMind + float32 RAG side by side to verify compression ratios yourself.

nm_bgem3_index.pkl

64 MB

NodeMind BGE-M3 binary fingerprint index — 32× smaller than float32

Download

rag_bgem3_index.pkl

2,048 MB

Float32 RAG baseline — verify the 32× compression yourself

Download

hnsw_size_reference.txt

<1 KB

HNSW = float32 × 1.5× — formula and explanation for the 48× number

Download

nm_bgebase256_index.pkl

16 MB

NodeMind BGE-base 256-bit PCA — 96× smaller than float32

Download

corpus.pkl

~144 MB

500K text chunks — shared source for all indexes

Download

NodeMind_RealWorld_Benchmark.pdf

~1 MB

Full benchmark report with methodology, tables, and caveats

Download PDF

Verify compression in Python

# pip install sentence-transformers (only needed for query, not verification)
import pickle

with open("nm_bgem3_index.pkl", "rb") as f: nm  = pickle.load(f)
with open("rag_bgem3_index.pkl","rb") as f: rag = pickle.load(f)

nm_mb  = nm["fps"].nbytes         / 1e6   # → 64
rag_mb = rag["embeddings"].nbytes  / 1e6   # → 2048
ratio  = rag["embeddings"].nbytes // nm["fps"].nbytes   # → 32

print(f"NodeMind : {nm_mb:.0f} MB")
print(f"Float32  : {rag_mb:.0f} MB")
print(f"Ratio    : {ratio}×")

# BGE-base 256-bit (96×)
with open("nm_bgebase256_index.pkl","rb") as f: nm96 = pickle.load(f)
# nm96["fps"] shape: (500000, 32)  →  16 MB
# float32 baseline: 500000 × 768 × 4 = 1,536 MB  →  96×

Run a query

import numpy as np
from sentence_transformers import SentenceTransformer

model  = SentenceTransformer("BAAI/bge-m3")
fps    = nm["fps"]
with open("corpus.pkl","rb") as f: corpus = pickle.load(f)
chunks = corpus["chunks"]

POPCOUNT = np.array([bin(i).count('1') for i in range(256)], dtype=np.int32)

def query_nodemind(text, top_k=5):
    emb  = model.encode([text], normalize_embeddings=True)[0]
    q_fp = _binarise(emb, nm)  # binarisation uses index metadata (patent-protected)
    dists = POPCOUNT[np.bitwise_xor(fps, q_fp[np.newaxis, :])].sum(axis=1)
    top   = np.argsort(dists)[:top_k]
    return [(int(dists[i]), chunks[i][:120]) for i in top]

for dist, text in query_nodemind("What is quantum entanglement?"):
    print(f"  [{dist:4d}] {text}")

The _binarise function uses the metadata stored in the pkl file. The exact method is covered under AU 2026901656 — the index is self-contained without reading the patent.

How It Works

1. Embed

Text is chunked and embedded with a sentence model (BGE-M3 or BGE-base), producing a float32 vector per chunk.

2. Binarise

Each embedding is converted to a compact binary fingerprint using pre-computed index metadata. Integer-only — no GPU at query time. Method is patent-protected (AU 2026901656).

3. Index (MIH)

Binary fingerprints stored in a Multi-Index Hash structure. Query finds candidates by Hamming distance — pure integer arithmetic, any CPU. MIH structure: Norouzi et al. CVPR 2012. Novel contribution (AU 2026901657): CTV binarisation + portable single-file format.

4. Query

Embed query → binarise → Hamming search. Single .pkl file, no server, no Docker, no external DB.

Honest Caveats

Self-retrieval benchmark. Queries are perturbed corpus chunks — optimistic for binary methods. End-to-end QA on BEIR / MS MARCO not yet measured.
HNSW comparison is size-only. Real FAISS HNSW achieves recall@10 ≈ 0.95–0.99 with graph traversal. A direct neutral head-to-head on a held-out set has not been run yet.
96× requires a lighter model. BGE-base + PCA-256 gives 96×. BGE-M3 (stronger, cross-lingual) gives 32×/48×.
Text-only corpus. Tables, code blocks, and multi-modal documents were not tested.
2 GB download for float32 baseline. Budget the bandwidth if you want to verify baseline sizes yourself.

Patents

AU 2026901656 — WHT Integer Codec: integer-only binarisation without learned projection.
AU 2026901657 — NodeMind Centroid MIH: CTV-based binary fingerprinting + MIH search.

Filed IP Australia, May 2026. Built in Coleambally, NSW, Australia.