Benchmarks

Three reproducible benchmarks · Real-world text · Multimodal · Official BEIR qrels · Independently verifiable — download and run it yourself.

Jump to: Real-World (500K chunks) · Multimodal (Text · Image · Audio · Table · Code) · BEIR-Combined (75K chunks · official qrels)

32×

vs float32 RAG

48×

vs HNSW index

96×

BGE-base 256-bit

1.000

Recall@10

1. Real-World Benchmark — 500,000 Chunks

Mixed corpus: Wikipedia + arXiv + Project Gutenberg. Embedded with BAAI/bge-m3 on NVIDIA A40.

Corpus & Methodology

Mixed real-world corpus across three domains — general knowledge, scientific literature, and long-form prose.

Source	Domain	Size	Description
Wikipedia (Simple English)	General knowledge	~100 MB	Encyclopedia articles
arXiv papers	Science / ML	~40 MB	CS & ML abstracts + intros
Project Gutenberg	Literature	~28 MB	Public domain books
Total	Mixed	~168 MB raw	642,939 paragraphs → 500,000 chunks

Chunking: 400 words / chunk, 50-word overlap. Embedding: BAAI/bge-m3 (1024-dim) on NVIDIA A40. Recall: 1,000 queries vs exact cosine top-k ground truth.

Self-retrieval protocol — queries are perturbed corpus chunks. Optimistic for binary methods. See Caveats below.

Retrieval Accuracy

BGE-M3 · 1024-bit binary fingerprints

Metric	NodeMind MIH	Ground Truth
Recall@1	0.999	1.000
Recall@3	0.999	1.000
Recall@5	1.000	1.000
Recall@10	1.000	1.000
Recall@20	1.000	1.000
MRR@10	0.9992	1.000

BGE-base · 768-bit and 256-bit (PCA)

Metric	768-bit	256-bit (PCA)
Recall@1	0.999	1.000
Recall@5	1.000	1.000
Recall@10	1.000	1.000
MRR@10	0.9995	1.000

Index Size — 500,000 Chunks

Index	Size	Bytes / chunk	vs float32
NodeMind BGE-M3 (1024-bit)	64 MB	128 B	32×
Float32 RAG — BGE-M3 (baseline)	2,048 MB	4,096 B	1× (reference)
HNSW index (float32 × 1.5× overhead)	3,072 MB	6,144 B	48× vs NM
NodeMind BGE-base 256-bit (PCA)	16 MB	32 B	96×

Index only — document text stored separately and equally in all systems.

Download — Verify It Yourself

All files generated from the same 500,000 chunks. Download NodeMind + float32 RAG side by side to verify compression ratios yourself.

nm_bgem3_index.pkl

64 MB

NodeMind BGE-M3 binary fingerprint index — 32× smaller than float32

Download

rag_bgem3_index.pkl

2,048 MB

Float32 RAG baseline — verify the 32× compression yourself

Download

hnsw_size_reference.txt

<1 KB

HNSW = float32 × 1.5× — formula and explanation for the 48× number

Download

nm_bgebase256_index.pkl

16 MB

NodeMind BGE-base 256-bit PCA — 96× smaller than float32

Download

corpus.pkl

~144 MB

500K text chunks — shared source for all indexes

Download

NodeMind_RealWorld_Benchmark.pdf

~1 MB

Full benchmark report with methodology, tables, and caveats

Download PDF

Verify compression in Python

# pip install sentence-transformers (only needed for query, not verification)
import pickle

with open("nm_bgem3_index.pkl", "rb") as f: nm  = pickle.load(f)
with open("rag_bgem3_index.pkl","rb") as f: rag = pickle.load(f)

nm_mb  = nm["fps"].nbytes         / 1e6   # → 64
rag_mb = rag["embeddings"].nbytes  / 1e6   # → 2048
ratio  = rag["embeddings"].nbytes // nm["fps"].nbytes   # → 32

print(f"NodeMind : {nm_mb:.0f} MB")
print(f"Float32  : {rag_mb:.0f} MB")
print(f"Ratio    : {ratio}×")

# BGE-base 256-bit (96×)
with open("nm_bgebase256_index.pkl","rb") as f: nm96 = pickle.load(f)
# nm96["fps"] shape: (500000, 32)  →  16 MB
# float32 baseline: 500000 × 768 × 4 = 1,536 MB  →  96×

Run a query

import numpy as np
from sentence_transformers import SentenceTransformer

model  = SentenceTransformer("BAAI/bge-m3")
fps    = nm["fps"]
with open("corpus.pkl","rb") as f: corpus = pickle.load(f)
chunks = corpus["chunks"]

POPCOUNT = np.array([bin(i).count('1') for i in range(256)], dtype=np.int32)

def query_nodemind(text, top_k=5):
    emb  = model.encode([text], normalize_embeddings=True)[0]
    q_fp = _binarise(emb, nm)  # binarisation uses index metadata (patent-protected)
    dists = POPCOUNT[np.bitwise_xor(fps, q_fp[np.newaxis, :])].sum(axis=1)
    top   = np.argsort(dists)[:top_k]
    return [(int(dists[i]), chunks[i][:120]) for i in top]

for dist, text in query_nodemind("What is quantum entanglement?"):
    print(f"  [{dist:4d}] {text}")

The _binarise function uses the metadata stored in the pkl file. The exact method is covered under AU 2026904283 — the index is self-contained without reading the patent.

2. Multimodal Benchmark — Text · Image · Audio · Table · Code

200 items across 5 modalities · BGE-Visualized-M3 (joint text+image, 1024-dim) vs Gemini embedding-2 float32 RAG · NVIDIA A40 · 50 ground-truth-mapped queries.

32×

NM-1024 vs float32

128×

NM-256 vs float32

modalities tested

1.000

Recall@1 every variant

Corpus & Methodology

Real files across five modalities — text chunks, photographs, environmental audio clips, structured tables, and source code.

Modality	Items	Queries	Description
Text	50	10	5 topics × 10 chunks — astronomy, biology, computing, medicine, climate
Image	50	10	5 categories × 10 real Unsplash photos — cats, dogs, cars, food, nature
Audio	40	10	ESC-10 clips — dog_bark, rain, sea_waves, crackling_fire (CC-BY 4.0)
Table	30	10	3 types × 10 unique tables — company / city / portfolio identifiers
Code	30	10	10 Python algorithms + 10 SQL queries + 10 Bash scripts
Total	200	50	BGE-Visualized-M3 1024-dim · NVIDIA A40 (46 GB)

RAG baseline: gemini-embedding-2 float32 + HNSW (paid API). NodeMind: bge-visualized-m3 (MIT, self-hosted). Audio routed through Whisper transcription before embedding. Recall measured against 1-to-1 ground truth.

Self-retrieval protocol — each query has exactly one correct corpus item. Optimistic for retrieval methods. See Caveats below.

Retrieval Accuracy — All Variants vs Gemini RAG

Variant	Recall@1	Recall@5	Recall@10	MRR@10
NodeMind NM-1024 (32×)	1.0000	1.0000	1.0000	1.0000
NodeMind NM-512 (64×)	1.0000	1.0000	1.0000	1.0000
NodeMind NM-256 (128×)	1.0000	1.0000	1.0000	1.0000
Gemini RAG float32 (baseline)	1.0000	1.0000	1.0000	1.0000

All 3 NodeMind variants match the float32 baseline exactly across all 50 queries.

Per-Modality Recall@1

Modality	Queries	NM-1024	NM-512	NM-256	Gemini RAG
Text	10	1.0000	1.0000	1.0000	1.0000
Image	10	1.0000	1.0000	1.0000	1.0000
Audio	10	1.0000	1.0000	1.0000	1.0000
Table	10	1.0000	1.0000	1.0000	1.0000
Code	10	1.0000	1.0000	1.0000	1.0000

Index Size — Production Scale (500,000 chunks)

Index	Size	Bytes / chunk	Compression
NodeMind NM-1024 (1024-bit)	60 MB	128 B	32×
NodeMind NM-512 (512-bit)	30 MB	64 B	64×
NodeMind NM-256 (256-bit)	15 MB	32 B	128×
Float32 RAG (Gemini embedding-2, 1024-dim)	2,048 MB	4,096 B	1× (reference)

Same compression math as real-world benchmark — bytes per fingerprint × N chunks. Float32 RAG baseline is identical at production scale regardless of modality.

Download — Verify It Yourself

All artefacts from the 200-item corpus. Includes the indexes, the corpus itself, real images and audio, and a side-by-side query script.

NodeMind_Multimodal_Benchmark_FINAL.pdf

578 KB

11-page benchmark report — per-modality recall, query/result pairs, methodology

Download PDF

nodemind_index.pkl

924 KB

All 3 binary variants in one file — verify 32× / 64× / 128× yourself

Download

gemini_rag_index.pkl

851 KB

Float32 baseline — Gemini embedding-2 vectors

Download

gemini_hnsw.bin

830 KB

HNSW graph built on Gemini float32 vectors

Download

corpus.json

67 KB

All 200 items — text, image captions, audio labels

Download

queries.json

8 KB

50 benchmark queries with ground-truth item IDs

Download

images/ (50 photos)

1.7 MB

50 verified Unsplash JPGs — cats, dogs, cars, food, nature

Browse

audio/ (40 clips)

17 MB

40 real ESC-10 environmental WAV files (CC-BY 4.0)

Browse

query_demo.py

16 KB

Run any query, see NodeMind vs Gemini RAG side by side

Download

Verify compression in Python

import os

nm_bytes  = os.path.getsize("nodemind_index.pkl")
rag_bytes = os.path.getsize("gemini_rag_index.pkl")

print(f"NodeMind index  : {nm_bytes  / 1e6:.3f} MB")
print(f"Gemini RAG index: {rag_bytes / 1e6:.3f} MB")
print(f"RAG is {rag_bytes // nm_bytes}× larger than NodeMind")

# At production scale — 500,000 chunks:
# NodeMind NM-1024  =   60 MB
# NodeMind NM-512   =   30 MB
# NodeMind NM-256   =   15 MB
# Float32 RAG       = 2,048 MB  →  32× / 68× larger

Run a query

# pip install flagembedding hnswlib numpy
# wget https://huggingface.co/BAAI/BGE-Visualized/resolve/main/Visualized_m3.pth -O bge-visualized-m3.pth

# NodeMind only — no API key needed
python query_demo.py --no-rag --query "dog barking loudly outdoors"

# NodeMind vs Gemini RAG side by side
export GEMINI_API_KEY=your_key_here
python query_demo.py --query "orange tabby cat green eyes"
python query_demo.py --query "crackling fire campfire sound"
python query_demo.py --query "SELECT with JOIN employees departments"

The NodeMind binarisation method is patent-protected (AU 2026904283). The .pkl indexes are self-contained — work without reading the patent.

3. BEIR-Combined Benchmark — 75,128 Chunks · 2,677 Queries · Official Qrels

Four BEIR datasets concatenated into one corpus — NFCorpus, SciFact, ArguAna, FiQA. Recall measured against official BEIR judgements, not self-retrieval. End-to-end QA retrieval.

Why this benchmark

Standard RAG (Pinecone, Weaviate, FAISS-flat) has to keep 308 MB of float32 vectors in RAM forever to answer a query against this 75K corpus. NodeMind doesn't — once the index is built, you can delete the float32 vectors entirely. All you keep is the 9.6 MB binary fingerprint index (or 2.4 MB at 96×).

For non-technical readers: if your AI assistant today needs 300 MB of "memory" per book, NodeMind needs 10 MB for the same book — same query, same answers. At a million documents that's "fits on a laptop" vs "$5,000/mo Pinecone bill".

Corpus & Methodology

Source	Chunks	Queries	Qrels
NFCorpus	3,633	323	12,334
SciFact	5,183	300	339
ArguAna	8,674	1,406	1,406
FiQA	57,638	648	1,706
Total (combined)	75,128	2,677	15,785

Embedded with BGE-M3 (1024-dim) and BGE-base-en-v1.5 (768-dim). Hardware: NVIDIA A40. Recall formula: hits / |relevant_set|.

Retrieval Accuracy — Cell 32× / 48× (BGE-M3, 1024-bit)

Method	R@10	NDCG@10	MRR@10	Index size
Float32 cosine (exact)	0.629	0.402	0.375	308 MB
HNSW float32 (FAISS HNSW32)	0.619	0.396	0.369	~462 MB
NodeMind B (no PCA)	0.601	0.373	0.341	9.6 MB
FAISS Fixed Binary (sign threshold)	0.587	0.367	0.338	9.6 MB
NodeMind A (with PCA)	0.524	0.306	0.263	9.6 MB

NodeMind B beats FAISS Fixed Binary at the same index size; loses ~3 pp R@10 vs HNSW float32 at 48× less storage.

Retrieval Accuracy — Cell 96× (BGE-base, 256-bit)

Method	R@10	NDCG@10	MRR@10	Index size
Float32 cosine (exact)	0.676	0.447	0.415	231 MB
HNSW float32	0.669	0.445	0.414	~346 MB
NodeMind A (PCA → 256-bit)	0.535	0.338	0.310	2.4 MB
FAISS Fixed Binary (sign threshold)	0.421	0.262	0.243	2.4 MB

At 96× compression NodeMind A beats FAISS Fixed Binary by +27% relative R@10 at the same index size.

Index Size — 75,128 Chunks

Index	Size	Compression
NodeMind BGE-M3 (1024-bit)	9.6 MB	—
Float32 RAG (BGE-M3, 1024-dim)	308 MB	32× smaller (NodeMind)
HNSW BGE-M3 (float32 × 1.5×)	~462 MB	48× smaller (NodeMind)
NodeMind BGE-base (256-bit PCA)	2.4 MB	96× smaller vs BGE-base float32
Float32 RAG (BGE-base, 768-dim)	231 MB	—

Download — Verify It Yourself

All five methods ran on the same 75,128 chunks. Float32 baselines are transparent — anyone can run cosine on them to verify our recall numbers without trusting our code.

corpus.json

69 MB

All 75,128 chunks of text — shared by every method

Download

queries.json

2.4 MB

2,677 BEIR queries + official qrels

Download

float32_rag_bgem3.pkl

308 MB

Standard RAG float32 baseline (BGE-M3) — what Pinecone/Weaviate keep

Download

float32_rag_bgebase.pkl

231 MB

Standard RAG float32 baseline (BGE-base, 768-dim)

Download

nodemind_index_32x.pkl

9.6 MB

NodeMind binary index — 1024-bit fingerprint (32× smaller than float32)

Download

nodemind_index_96x.pkl

2.4 MB

NodeMind binary index — 256-bit fingerprint (96× smaller than float32)

Download

benchmark_results.json

2 KB

Full metrics (5 methods × 2 cells)

Download

verify_demo.py

3 KB

One-shot script — prints sizes + ratios + recall numbers

Download

Verify compression in Python

import os

f32_m3   = os.path.getsize("float32_rag_bgem3.pkl")
nm_32x   = os.path.getsize("nodemind_index_32x.pkl")
f32_base = os.path.getsize("float32_rag_bgebase.pkl")
nm_96x   = os.path.getsize("nodemind_index_96x.pkl")

print(f"Float32 RAG (BGE-M3)    : {f32_m3   / 1e6:7.1f} MB")
print(f"NodeMind 32x            : {nm_32x   / 1e6:7.1f} MB   →  {f32_m3   / nm_32x:5.1f}× smaller")
print(f"Float32 RAG (BGE-base)  : {f32_base / 1e6:7.1f} MB")
print(f"NodeMind 96x            : {nm_96x   / 1e6:7.1f} MB   →  {f32_base / nm_96x:5.1f}× smaller")

# Expected:
# Float32 RAG (BGE-M3)    :  307.7 MB
# NodeMind 32x            :    9.6 MB   →   32.0× smaller
# Float32 RAG (BGE-base)  :  230.8 MB
# NodeMind 96x            :    2.4 MB   →   96.0× smaller

Run a query against the float32 baseline

import pickle, numpy as np
from sentence_transformers import SentenceTransformer

with open("float32_rag_bgem3.pkl", "rb") as f:
    rag = pickle.load(f)
embs = rag["embeddings"]                          # (75128, 1024) float32

enc = SentenceTransformer("BAAI/bge-m3")
q   = enc.encode(["your query here"], normalize_embeddings=True)[0]
top = np.argsort(-(embs @ q))[:5]
print(top)

The float32 RAG .pkl files are transparent — verify our recall numbers without trusting our code. NodeMind .pkl files use opaque pickle keys; the codec is patent-protected (AU 2026904283).

How It Works

1. Embed

Text is chunked and embedded with a sentence model (BGE-M3 or BGE-base), producing a float32 vector per chunk.

2. Binarise

Each embedding is converted to a compact binary fingerprint using pre-computed index metadata. Integer-only — no GPU at query time. Method is patent-protected (AU 2026904283).

3. Index (MIH)

Binary fingerprints stored in a Multi-Index Hash structure. Query finds candidates by Hamming distance — pure integer arithmetic, any CPU. MIH structure: Norouzi et al. CVPR 2012. Novel contribution (AU 2026904283): patent-pending binary fingerprinting + portable single-file format.

4. Query

Embed query → binarise → Hamming search. Single .pkl file, no server, no Docker, no external DB.

Honest Caveats

Self-retrieval protocols (real-world + multimodal). Queries derived from corpus items (perturbed real-world chunks; ground-truth-mapped multimodal queries). Optimistic for binary methods. End-to-end QA against official BEIR qrels is published as the third benchmark above (BEIR-Combined, 75K chunks, 2,677 queries, no perturbation).
Float32 cosine wins on raw recall (BEIR-Combined). Exact float32 cosine still gives the best retrieval if your dataset fits in RAM. NodeMind's value proposition is storage / RAM at scale, not "we beat exact search" — at 32× / 96× less storage we lose ~3 pp R@10 on the 32× cell and more at 96×. NodeMind beats FAISS Fixed Binary at every same-size comparison.
HNSW comparison is size-only. Real FAISS HNSW with tuned graph degree achieves recall@10 ≈ 0.95–0.99. We compare compression against HNSW (incl. ~50% graph overhead) and recall against exact float32 cosine — not against tuned HNSW recall.
96× / 128× require a lighter encoder or shorter fingerprint. BGE-base + PCA-256 gives 96×. NM-256 multimodal gives 128×. BGE-M3 1024-bit (stronger, cross-lingual) gives 32× / 48×.
Multimodal corpus is small (200 items). Perfect-recall is partly because every query has exactly one correct corpus item. Larger neutral test sets are pending.
2 GB download for the real-world float32 baseline. Budget the bandwidth if you want to verify those baseline sizes yourself.

Patents

AU 2026904283 — NodeMind Codec & Index: patent-pending binarisation method + portable single-file binary fingerprint index format.

Filed at IP Australia. Built in Coleambally, NSW, Australia.