10/1/2025
Yes. I’ve added a RAG-only benchmarking harness under
RAG/ that evaluates vecRAG dense retrieval against other baselines and reports accuracy and latency.
What I set up
- RAG/bench.py — runs retrieval-only benchmarks with:
- vec = FAISS dense (vecRAG)
- lex = lexical baseline (token overlap on concept_texts)
- lightvec = LightRAG vector-only via src/adapters/lightrag/vectorstore_faiss.py
- RAG/README.md — usage and options
- RAG/init.py
- src/db_faiss.py (FaissDB) for FAISS search
- src/vectorizer.py (EmbeddingBackend) for GTR-T5 768D embeddings
- src/adapters/lightrag/vectorstore_faiss.py FAISS bridge for LightRAG
- Metrics: P@1, P@5, MRR@10, nDCG@10
- Latency: mean and p95, per backend
- Per-query JSONL and a Markdown summary
How it works (vecRAG-only evaluation)
FAISS_NPZ_PATH) and the FAISS index path from artifacts/faiss_meta.json.- 768D: concept embeddings only
- 784D: concat tmd_dense[16] + concept_vec[768], L2-normalized for IP
- self: query is the concept_text for each item; gold is its own position
- cpesh: if artifacts/cpesh_cache.jsonl exists, query = CPESH probe (fallback: expected or concept), gold is its doc_id mapped into the NPZ
Run it
Prereqs:
artifacts/faiss_meta.json points to the index.pip install -r requirements.txt (LightRAG vector path needs lightrag-hku==1.4.8.2 already pinned).Examples:
bash
Self-retrieval (sanity, dense-only baseline)
python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec
Add lexical baseline for reference
python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lex
If LightRAG installed, compare vector-only backends on same FAISS index
python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lightvec
Useful env:
bash
Use 784D fused NPZ (recommended for retrieval) or point to your NPZ explicitly
export FAISS_NPZ_PATH=artifacts/fw10k_vectors.npz
Tune IVF probes
export FAISS_NPROBE=16
Outputs:
RAG/results/bench_.jsonl RAG/results/summary_.md Notes
Next steps (optional)
Status
Thoughtfor 9s
Yes. I implemented a RAG-only benchmarking harness under
RAG/ to evaluate vecRAG in isolation and compare it against other RAG baselines by score.
What’s included
- RAG/bench.py — retrieval-only benchmark runner.
- RAG/README.md — instructions and options.
- RAG/init.py.
- vec: FAISS dense (your vecRAG) via src/db_faiss.py (FaissDB.search()).
- lex: simple lexical token-overlap baseline over concept_texts.
- lightvec: LightRAG vector-only via src/adapters/lightrag/vectorstore_faiss.py.
- self: query = concept_texts[i], label = its own position. Sanity check of pure vector retrieval.
- cpesh: if artifacts/cpesh_cache.jsonl exists, query = CPESH probe (fallback: expected or concept), label = CPESH doc_id mapped into NPZ positions.
- P@1, P@5, MRR@10, nDCG@10.
- Mean/P95 latency per backend.
- Automatically matches NPZ vector dimension:
- 768D: concept embedding only.
- 784D: 16D TMD + 768D concept (normalized) to align with IVF-IP.
How to run
bash
pip install -r requirements.txt
artifacts/faiss_meta.json with an index_path to your FAISS index.FAISS_NPZ_PATH or common artifact names).Optional env:
bash
export FAISS_NPZ_PATH=artifacts/fw10k_vectors.npz # or your NPZ
export FAISS_NPROBE=16 # IVF tuning
bash
Dense-only vecRAG baseline, self-retrieval
python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec
Add lexical baseline for reference
python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lex
If LightRAG installed, compare vector-only backends on the same FAISS index
python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lightvec
CPESH-driven evaluation (if artifacts/cpesh_cache.jsonl exists)
python RAG/bench.py --dataset cpesh --n 1000 --topk 10 --backends vec,lex
RAG/results/bench_.jsonl RAG/results/summary_.md Notes
lightvec uses the same FAISS index and NPZ metadata through LightRAGFaissVectorStore.[tmd_dense(16), concept(768)] and normalized for IP search.