10/1/2025

Yes. I’ve added a RAG-only benchmarking harness under

RAG/ that evaluates vecRAG dense retrieval against other baselines and reports accuracy and latency.

What I set up

RAG harness:

- RAG/bench.py — runs retrieval-only benchmarks with:

- vec = FAISS dense (vecRAG)

- lex = lexical baseline (token overlap on concept_texts)

- lightvec = LightRAG vector-only via src/adapters/lightrag/vectorstore_faiss.py

- RAG/README.md — usage and options

- RAG/init.py

Reused components:

- src/db_faiss.py (FaissDB) for FAISS search

- src/vectorizer.py (EmbeddingBackend) for GTR-T5 768D embeddings

- src/adapters/lightrag/vectorstore_faiss.py FAISS bridge for LightRAG

Metrics & outputs:

- Metrics: P@1, P@5, MRR@10, nDCG@10

- Latency: mean and p95, per backend

- Per-query JSONL and a Markdown summary

How it works (vecRAG-only evaluation)

Reads the NPZ metadata (auto-detected or FAISS_NPZ_PATH) and the FAISS index path from artifacts/faiss_meta.json.

Chooses query dimensionality automatically:

- 768D: concept embeddings only

- 784D: concat tmd_dense[16] + concept_vec[768], L2-normalized for IP

Datasets:

- self: query is the concept_text for each item; gold is its own position

- cpesh: if artifacts/cpesh_cache.jsonl exists, query = CPESH probe (fallback: expected or concept), gold is its doc_id mapped into the NPZ

Aligns with acceptance gates for L1 dense-only and latency/hit-rate reporting. Lexical is opt-in as a separate baseline.

Run it

Prereqs:

Build the FAISS index and NPZ; ensure artifacts/faiss_meta.json points to the index.

Install deps: pip install -r requirements.txt (LightRAG vector path needs lightrag-hku==1.4.8.2 already pinned).

Examples:

bash

Self-retrieval (sanity, dense-only baseline)

python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec

Add lexical baseline for reference

python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lex

If LightRAG installed, compare vector-only backends on same FAISS index

python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lightvec

Useful env:

bash

Use 784D fused NPZ (recommended for retrieval) or point to your NPZ explicitly

export FAISS_NPZ_PATH=artifacts/fw10k_vectors.npz

Tune IVF probes

export FAISS_NPROBE=16

Outputs:

JSONL: RAG/results/bench_.jsonl

Summary: RAG/results/summary_.md

Notes

The harness picks the NPZ vector shape (768 vs 784) and builds queries to match. Ensure NPZ and FAISS index dimensions correspond.

LightRAG comparison is vector-only (not graph/hybrid) so all backends share the same FAISS data for a fair dense retrieval comparison.

Next steps (optional)

Add BM25/ELSER baseline if you want a stronger lexical comparator.

Add a “hybrid” LightRAG mode to compare graph-augmented retrieval separately from dense.

Status

RAG-only benchmarking added: RAG/bench.py, RAG/README.md.

Ready to run and grade vecRAG vs other backends by score under RAG/.

Thoughtfor 9s

Yes. I implemented a RAG-only benchmarking harness under

RAG/ to evaluate vecRAG in isolation and compare it against other RAG baselines by score.

What’s included

Files

- RAG/bench.py — retrieval-only benchmark runner.

- RAG/README.md — instructions and options.

- RAG/init.py.

Backends

- vec: FAISS dense (your vecRAG) via src/db_faiss.py (FaissDB.search()).

- lex: simple lexical token-overlap baseline over concept_texts.

- lightvec: LightRAG vector-only via src/adapters/lightrag/vectorstore_faiss.py.

Datasets

- self: query = concept_texts[i], label = its own position. Sanity check of pure vector retrieval.

- cpesh: if artifacts/cpesh_cache.jsonl exists, query = CPESH probe (fallback: expected or concept), label = CPESH doc_id mapped into NPZ positions.

Metrics

- P@1, P@5, MRR@10, nDCG@10.

- Mean/P95 latency per backend.

Dimensionality handling

- Automatically matches NPZ vector dimension:

- 768D: concept embedding only.

- 784D: 16D TMD + 768D concept (normalized) to align with IVF-IP.

How to run

Install dependencies (LightRAG backends are optional):

bash

pip install -r requirements.txt

Ensure artifacts exist:

artifacts/faiss_meta.json with an index_path to your FAISS index.

NPZ metadata (auto-detected via FAISS_NPZ_PATH or common artifact names).

Optional env:

bash

export FAISS_NPZ_PATH=artifacts/fw10k_vectors.npz # or your NPZ

export FAISS_NPROBE=16 # IVF tuning

Examples:

bash

Dense-only vecRAG baseline, self-retrieval

python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec

Add lexical baseline for reference

python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lex

If LightRAG installed, compare vector-only backends on the same FAISS index

python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lightvec

CPESH-driven evaluation (if artifacts/cpesh_cache.jsonl exists)

python RAG/bench.py --dataset cpesh --n 1000 --topk 10 --backends vec,lex

Outputs:

Per-query results JSONL: RAG/results/bench_.jsonl

Summary Markdown: RAG/results/summary_.md

Notes

This is a retrieval-only harness (no LVM inference, no text generation) so vecRAG is isolated and fairly compared.

For a fair apples-to-apples comparison, lightvec uses the same FAISS index and NPZ metadata through LightRAGFaissVectorStore.

If your index is fused 784D, queries are constructed as [tmd_dense(16), concept(768)] and normalized for IP search.

Next options

Add BM25 or Elastic baseline if you want a stronger lexical comparator.

Add a separate “graph/hybrid” evaluation to compare LightRAG graph-enhanced retrieval vs dense-only vecRAG (kept separate since it uses extra signals).

Status

Implemented RAG-only benchmarking under RAG/. Ready to run and produce accuracy/latency comparisons for vecRAG vs baselines.

What I set up

What I set up

How it works (vecRAG-only evaluation)

Run it

Self-retrieval (sanity, dense-only baseline)

Add lexical baseline for reference

If LightRAG installed, compare vector-only backends on same FAISS index

Use 784D fused NPZ (recommended for retrieval) or point to your NPZ explicitly

Tune IVF probes

Notes

Next steps (optional)

Status

What’s included

How to run

Dense-only vecRAG baseline, self-retrieval

Add lexical baseline for reference

If LightRAG installed, compare vector-only backends on the same FAISS index

CPESH-driven evaluation (if artifacts/cpesh_cache.jsonl exists)

Notes

Next options

Status

Related Research

VecRAG Test Complete Ready For LVM Training

Author

GWOM White Paper: GWOM — GraphRAG + WikiSearch + Ontology Model for Ordered Concept Sequences

PRD — VecRAG + LVM “Dual-Path Next-Vector Generation”