Benchmarks

Three reproducible benchmarks · Real-world text · Multimodal · Official BEIR qrels · Independently verifiable — download and run it yourself.

Jump to: Real-World (500K chunks)  ·  Multimodal (Text · Image · Audio · Table · Code)  ·  BEIR-Combined (75K chunks · official qrels)

32×
vs float32 RAG
48×
vs HNSW index
96×
BGE-base 256-bit
1.000
Recall@10

1. Real-World Benchmark — 500,000 Chunks

Mixed corpus: Wikipedia + arXiv + Project Gutenberg. Embedded with BAAI/bge-m3 on NVIDIA A40.

Corpus & Methodology

Mixed real-world corpus across three domains — general knowledge, scientific literature, and long-form prose.

SourceDomainSizeDescription
Wikipedia (Simple English)General knowledge~100 MBEncyclopedia articles
arXiv papersScience / ML~40 MBCS & ML abstracts + intros
Project GutenbergLiterature~28 MBPublic domain books
TotalMixed~168 MB raw642,939 paragraphs → 500,000 chunks

Chunking: 400 words / chunk, 50-word overlap. Embedding: BAAI/bge-m3 (1024-dim) on NVIDIA A40. Recall: 1,000 queries vs exact cosine top-k ground truth.

Self-retrieval protocol — queries are perturbed corpus chunks. Optimistic for binary methods. See Caveats below.

Retrieval Accuracy

BGE-M3 · 1024-bit binary fingerprints

MetricNodeMind MIHGround Truth
Recall@10.9991.000
Recall@30.9991.000
Recall@51.0001.000
Recall@101.0001.000
Recall@201.0001.000
MRR@100.99921.000

BGE-base · 768-bit and 256-bit (PCA)

Metric768-bit256-bit (PCA)
Recall@10.9991.000
Recall@51.0001.000
Recall@101.0001.000
MRR@100.99951.000

Index Size — 500,000 Chunks

IndexSizeBytes / chunkvs float32
NodeMind BGE-M3 (1024-bit)64 MB128 B32×
Float32 RAG — BGE-M3 (baseline)2,048 MB4,096 B1× (reference)
HNSW index (float32 × 1.5× overhead)3,072 MB6,144 B48× vs NM
NodeMind BGE-base 256-bit (PCA)16 MB32 B96×

Index only — document text stored separately and equally in all systems.

Download — Verify It Yourself

All files generated from the same 500,000 chunks. Download NodeMind + float32 RAG side by side to verify compression ratios yourself.

nm_bgem3_index.pkl
64 MB
NodeMind BGE-M3 binary fingerprint index — 32× smaller than float32
Download
rag_bgem3_index.pkl
2,048 MB
Float32 RAG baseline — verify the 32× compression yourself
Download
hnsw_size_reference.txt
<1 KB
HNSW = float32 × 1.5× — formula and explanation for the 48× number
Download
nm_bgebase256_index.pkl
16 MB
NodeMind BGE-base 256-bit PCA — 96× smaller than float32
Download
corpus.pkl
~144 MB
500K text chunks — shared source for all indexes
Download
NodeMind_RealWorld_Benchmark.pdf
~1 MB
Full benchmark report with methodology, tables, and caveats
Download PDF

Verify compression in Python

# pip install sentence-transformers (only needed for query, not verification)
import pickle

with open("nm_bgem3_index.pkl", "rb") as f: nm  = pickle.load(f)
with open("rag_bgem3_index.pkl","rb") as f: rag = pickle.load(f)

nm_mb  = nm["fps"].nbytes         / 1e6   # → 64
rag_mb = rag["embeddings"].nbytes  / 1e6   # → 2048
ratio  = rag["embeddings"].nbytes // nm["fps"].nbytes   # → 32

print(f"NodeMind : {nm_mb:.0f} MB")
print(f"Float32  : {rag_mb:.0f} MB")
print(f"Ratio    : {ratio}×")

# BGE-base 256-bit (96×)
with open("nm_bgebase256_index.pkl","rb") as f: nm96 = pickle.load(f)
# nm96["fps"] shape: (500000, 32)  →  16 MB
# float32 baseline: 500000 × 768 × 4 = 1,536 MB  →  96×

Run a query

import numpy as np
from sentence_transformers import SentenceTransformer

model  = SentenceTransformer("BAAI/bge-m3")
fps    = nm["fps"]
with open("corpus.pkl","rb") as f: corpus = pickle.load(f)
chunks = corpus["chunks"]

POPCOUNT = np.array([bin(i).count('1') for i in range(256)], dtype=np.int32)

def query_nodemind(text, top_k=5):
    emb  = model.encode([text], normalize_embeddings=True)[0]
    q_fp = _binarise(emb, nm)  # binarisation uses index metadata (patent-protected)
    dists = POPCOUNT[np.bitwise_xor(fps, q_fp[np.newaxis, :])].sum(axis=1)
    top   = np.argsort(dists)[:top_k]
    return [(int(dists[i]), chunks[i][:120]) for i in top]

for dist, text in query_nodemind("What is quantum entanglement?"):
    print(f"  [{dist:4d}] {text}")

The _binarise function uses the metadata stored in the pkl file. The exact method is covered under AU 2026904283 — the index is self-contained without reading the patent.

2. Multimodal Benchmark — Text · Image · Audio · Table · Code

200 items across 5 modalities · BGE-Visualized-M3 (joint text+image, 1024-dim) vs Gemini embedding-2 float32 RAG · NVIDIA A40 · 50 ground-truth-mapped queries.

32×
NM-1024 vs float32
128×
NM-256 vs float32
5
modalities tested
1.000
Recall@1 every variant

Corpus & Methodology

Real files across five modalities — text chunks, photographs, environmental audio clips, structured tables, and source code.

ModalityItemsQueriesDescription
Text50105 topics × 10 chunks — astronomy, biology, computing, medicine, climate
Image50105 categories × 10 real Unsplash photos — cats, dogs, cars, food, nature
Audio4010ESC-10 clips — dog_bark, rain, sea_waves, crackling_fire (CC-BY 4.0)
Table30103 types × 10 unique tables — company / city / portfolio identifiers
Code301010 Python algorithms + 10 SQL queries + 10 Bash scripts
Total20050BGE-Visualized-M3 1024-dim · NVIDIA A40 (46 GB)

RAG baseline: gemini-embedding-2 float32 + HNSW (paid API). NodeMind: bge-visualized-m3 (MIT, self-hosted). Audio routed through Whisper transcription before embedding. Recall measured against 1-to-1 ground truth.

Self-retrieval protocol — each query has exactly one correct corpus item. Optimistic for retrieval methods. See Caveats below.

Retrieval Accuracy — All Variants vs Gemini RAG

VariantRecall@1Recall@5Recall@10MRR@10
NodeMind NM-1024 (32×)1.00001.00001.00001.0000
NodeMind NM-512 (64×)1.00001.00001.00001.0000
NodeMind NM-256 (128×)1.00001.00001.00001.0000
Gemini RAG float32 (baseline)1.00001.00001.00001.0000

All 3 NodeMind variants match the float32 baseline exactly across all 50 queries.

Per-Modality Recall@1

ModalityQueriesNM-1024NM-512NM-256Gemini RAG
Text101.00001.00001.00001.0000
Image101.00001.00001.00001.0000
Audio101.00001.00001.00001.0000
Table101.00001.00001.00001.0000
Code101.00001.00001.00001.0000

Index Size — Production Scale (500,000 chunks)

IndexSizeBytes / chunkCompression
NodeMind NM-1024 (1024-bit)60 MB128 B32×
NodeMind NM-512 (512-bit)30 MB64 B64×
NodeMind NM-256 (256-bit)15 MB32 B128×
Float32 RAG (Gemini embedding-2, 1024-dim)2,048 MB4,096 B1× (reference)

Same compression math as real-world benchmark — bytes per fingerprint × N chunks. Float32 RAG baseline is identical at production scale regardless of modality.

Download — Verify It Yourself

All artefacts from the 200-item corpus. Includes the indexes, the corpus itself, real images and audio, and a side-by-side query script.

NodeMind_Multimodal_Benchmark_FINAL.pdf
578 KB
11-page benchmark report — per-modality recall, query/result pairs, methodology
Download PDF
nodemind_index.pkl
924 KB
All 3 binary variants in one file — verify 32× / 64× / 128× yourself
Download
gemini_rag_index.pkl
851 KB
Float32 baseline — Gemini embedding-2 vectors
Download
gemini_hnsw.bin
830 KB
HNSW graph built on Gemini float32 vectors
Download
corpus.json
67 KB
All 200 items — text, image captions, audio labels
Download
queries.json
8 KB
50 benchmark queries with ground-truth item IDs
Download
images/  (50 photos)
1.7 MB
50 verified Unsplash JPGs — cats, dogs, cars, food, nature
Browse
audio/  (40 clips)
17 MB
40 real ESC-10 environmental WAV files (CC-BY 4.0)
Browse
query_demo.py
16 KB
Run any query, see NodeMind vs Gemini RAG side by side
Download

Verify compression in Python

import os

nm_bytes  = os.path.getsize("nodemind_index.pkl")
rag_bytes = os.path.getsize("gemini_rag_index.pkl")

print(f"NodeMind index  : {nm_bytes  / 1e6:.3f} MB")
print(f"Gemini RAG index: {rag_bytes / 1e6:.3f} MB")
print(f"RAG is {rag_bytes // nm_bytes}× larger than NodeMind")

# At production scale — 500,000 chunks:
# NodeMind NM-1024  =   60 MB
# NodeMind NM-512   =   30 MB
# NodeMind NM-256   =   15 MB
# Float32 RAG       = 2,048 MB  →  32× / 68× larger

Run a query

# pip install flagembedding hnswlib numpy
# wget https://huggingface.co/BAAI/BGE-Visualized/resolve/main/Visualized_m3.pth -O bge-visualized-m3.pth

# NodeMind only — no API key needed
python query_demo.py --no-rag --query "dog barking loudly outdoors"

# NodeMind vs Gemini RAG side by side
export GEMINI_API_KEY=your_key_here
python query_demo.py --query "orange tabby cat green eyes"
python query_demo.py --query "crackling fire campfire sound"
python query_demo.py --query "SELECT with JOIN employees departments"

The NodeMind binarisation method is patent-protected (AU 2026904283). The .pkl indexes are self-contained — work without reading the patent.

3. BEIR-Combined Benchmark — 75,128 Chunks · 2,677 Queries · Official Qrels

Four BEIR datasets concatenated into one corpus — NFCorpus, SciFact, ArguAna, FiQA. Recall measured against official BEIR judgements, not self-retrieval. End-to-end QA retrieval.

Why this benchmark

Standard RAG (Pinecone, Weaviate, FAISS-flat) has to keep 308 MB of float32 vectors in RAM forever to answer a query against this 75K corpus. NodeMind doesn't — once the index is built, you can delete the float32 vectors entirely. All you keep is the 9.6 MB binary fingerprint index (or 2.4 MB at 96×).

For non-technical readers: if your AI assistant today needs 300 MB of "memory" per book, NodeMind needs 10 MB for the same book — same query, same answers. At a million documents that's "fits on a laptop" vs "$5,000/mo Pinecone bill".

Corpus & Methodology

SourceChunksQueriesQrels
NFCorpus3,63332312,334
SciFact5,183300339
ArguAna8,6741,4061,406
FiQA57,6386481,706
Total (combined)75,1282,67715,785

Embedded with BGE-M3 (1024-dim) and BGE-base-en-v1.5 (768-dim). Hardware: NVIDIA A40. Recall formula: hits / |relevant_set|.

Retrieval Accuracy — Cell 32× / 48× (BGE-M3, 1024-bit)

MethodR@10NDCG@10MRR@10Index size
Float32 cosine (exact)0.6290.4020.375308 MB
HNSW float32 (FAISS HNSW32)0.6190.3960.369~462 MB
NodeMind B (no PCA)0.6010.3730.3419.6 MB
FAISS Fixed Binary (sign threshold)0.5870.3670.3389.6 MB
NodeMind A (with PCA)0.5240.3060.2639.6 MB

NodeMind B beats FAISS Fixed Binary at the same index size; loses ~3 pp R@10 vs HNSW float32 at 48× less storage.

Retrieval Accuracy — Cell 96× (BGE-base, 256-bit)

MethodR@10NDCG@10MRR@10Index size
Float32 cosine (exact)0.6760.4470.415231 MB
HNSW float320.6690.4450.414~346 MB
NodeMind A (PCA → 256-bit)0.5350.3380.3102.4 MB
FAISS Fixed Binary (sign threshold)0.4210.2620.2432.4 MB

At 96× compression NodeMind A beats FAISS Fixed Binary by +27% relative R@10 at the same index size.

Index Size — 75,128 Chunks

IndexSizeCompression
NodeMind BGE-M3 (1024-bit)9.6 MB
Float32 RAG (BGE-M3, 1024-dim)308 MB32× smaller (NodeMind)
HNSW BGE-M3 (float32 × 1.5×)~462 MB48× smaller (NodeMind)
NodeMind BGE-base (256-bit PCA)2.4 MB96× smaller vs BGE-base float32
Float32 RAG (BGE-base, 768-dim)231 MB

Download — Verify It Yourself

All five methods ran on the same 75,128 chunks. Float32 baselines are transparent — anyone can run cosine on them to verify our recall numbers without trusting our code.

corpus.json
69 MB
All 75,128 chunks of text — shared by every method
Download
queries.json
2.4 MB
2,677 BEIR queries + official qrels
Download
float32_rag_bgem3.pkl
308 MB
Standard RAG float32 baseline (BGE-M3) — what Pinecone/Weaviate keep
Download
float32_rag_bgebase.pkl
231 MB
Standard RAG float32 baseline (BGE-base, 768-dim)
Download
nodemind_index_32x.pkl
9.6 MB
NodeMind binary index — 1024-bit fingerprint (32× smaller than float32)
Download
nodemind_index_96x.pkl
2.4 MB
NodeMind binary index — 256-bit fingerprint (96× smaller than float32)
Download
benchmark_results.json
2 KB
Full metrics (5 methods × 2 cells)
Download
verify_demo.py
3 KB
One-shot script — prints sizes + ratios + recall numbers
Download

Verify compression in Python

import os

f32_m3   = os.path.getsize("float32_rag_bgem3.pkl")
nm_32x   = os.path.getsize("nodemind_index_32x.pkl")
f32_base = os.path.getsize("float32_rag_bgebase.pkl")
nm_96x   = os.path.getsize("nodemind_index_96x.pkl")

print(f"Float32 RAG (BGE-M3)    : {f32_m3   / 1e6:7.1f} MB")
print(f"NodeMind 32x            : {nm_32x   / 1e6:7.1f} MB   →  {f32_m3   / nm_32x:5.1f}× smaller")
print(f"Float32 RAG (BGE-base)  : {f32_base / 1e6:7.1f} MB")
print(f"NodeMind 96x            : {nm_96x   / 1e6:7.1f} MB   →  {f32_base / nm_96x:5.1f}× smaller")

# Expected:
# Float32 RAG (BGE-M3)    :  307.7 MB
# NodeMind 32x            :    9.6 MB   →   32.0× smaller
# Float32 RAG (BGE-base)  :  230.8 MB
# NodeMind 96x            :    2.4 MB   →   96.0× smaller

Run a query against the float32 baseline

import pickle, numpy as np
from sentence_transformers import SentenceTransformer

with open("float32_rag_bgem3.pkl", "rb") as f:
    rag = pickle.load(f)
embs = rag["embeddings"]                          # (75128, 1024) float32

enc = SentenceTransformer("BAAI/bge-m3")
q   = enc.encode(["your query here"], normalize_embeddings=True)[0]
top = np.argsort(-(embs @ q))[:5]
print(top)

The float32 RAG .pkl files are transparent — verify our recall numbers without trusting our code. NodeMind .pkl files use opaque pickle keys; the codec is patent-protected (AU 2026904283).

How It Works

1. Embed

Text is chunked and embedded with a sentence model (BGE-M3 or BGE-base), producing a float32 vector per chunk.

2. Binarise

Each embedding is converted to a compact binary fingerprint using pre-computed index metadata. Integer-only — no GPU at query time. Method is patent-protected (AU 2026904283).

3. Index (MIH)

Binary fingerprints stored in a Multi-Index Hash structure. Query finds candidates by Hamming distance — pure integer arithmetic, any CPU. MIH structure: Norouzi et al. CVPR 2012. Novel contribution (AU 2026904283): patent-pending binary fingerprinting + portable single-file format.

4. Query

Embed query → binarise → Hamming search. Single .pkl file, no server, no Docker, no external DB.

Honest Caveats

Patents

AU 2026904283 — NodeMind Codec & Index: patent-pending binarisation method + portable single-file binary fingerprint index format.

Filed at IP Australia. Built in Coleambally, NSW, Australia.