TC
← All Research
What I set up
ExperimentvecRAG

What I set up

10/1/2025

2025-10-014 min read816 words

10/1/2025

Yes. I’ve added a RAG-only benchmarking harness under 

RAG/ that evaluates vecRAG dense retrieval against other baselines and reports accuracy and latency.

What I set up

  • RAG harness:
  • - RAG/bench.py — runs retrieval-only benchmarks with:

    - vec = FAISS dense (vecRAG)

    - lex = lexical baseline (token overlap on concept_texts)

    - lightvec = LightRAG vector-only via src/adapters/lightrag/vectorstore_faiss.py

    - RAG/README.md — usage and options

    - RAG/init.py

  • Reused components:
  • - src/db_faiss.py (FaissDB) for FAISS search

    - src/vectorizer.py (EmbeddingBackend) for GTR-T5 768D embeddings

    - src/adapters/lightrag/vectorstore_faiss.py FAISS bridge for LightRAG

  • Metrics & outputs:
  • - Metrics: P@1, P@5, MRR@10, nDCG@10

    - Latency: mean and p95, per backend

    - Per-query JSONL and a Markdown summary

    How it works (vecRAG-only evaluation)

  • Reads the NPZ metadata (auto-detected or FAISS_NPZ_PATH) and the FAISS index path from artifacts/faiss_meta.json.
  • Chooses query dimensionality automatically:
  • - 768D: concept embeddings only

    - 784D: concat tmd_dense[16] + concept_vec[768], L2-normalized for IP

  • Datasets:
  • - self: query is the concept_text for each item; gold is its own position

    - cpesh: if artifacts/cpesh_cache.jsonl exists, query = CPESH probe (fallback: expected or concept), gold is its doc_id mapped into the NPZ

  • Aligns with acceptance gates for L1 dense-only and latency/hit-rate reporting. Lexical is opt-in as a separate baseline.
  • Run it

    Prereqs:

  • Build the FAISS index and NPZ; ensure artifacts/faiss_meta.json points to the index.
  • Install deps: pip install -r requirements.txt (LightRAG vector path needs lightrag-hku==1.4.8.2 already pinned).
  • Examples:

    bash

    Self-retrieval (sanity, dense-only baseline)

    python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec

    Add lexical baseline for reference

    python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lex

    If LightRAG installed, compare vector-only backends on same FAISS index

    python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lightvec

    Useful env:

    bash

    Use 784D fused NPZ (recommended for retrieval) or point to your NPZ explicitly

    export FAISS_NPZ_PATH=artifacts/fw10k_vectors.npz

    Tune IVF probes

    export FAISS_NPROBE=16

    Outputs:

  • JSONL: RAG/results/bench_.jsonl
  • Summary: RAG/results/summary_.md
  • Notes

  • The harness picks the NPZ vector shape (768 vs 784) and builds queries to match. Ensure NPZ and FAISS index dimensions correspond.
  • LightRAG comparison is vector-only (not graph/hybrid) so all backends share the same FAISS data for a fair dense retrieval comparison.
  • Next steps (optional)

  • Add BM25/ELSER baseline if you want a stronger lexical comparator.
  • Add a “hybrid” LightRAG mode to compare graph-augmented retrieval separately from dense.
  • Status

  • RAG-only benchmarking added: RAG/bench.py, RAG/README.md.
  • Ready to run and grade vecRAG vs other backends by score under RAG/.
  • Thoughtfor 9s

    Yes. I implemented a RAG-only benchmarking harness under 

    RAG/ to evaluate vecRAG in isolation and compare it against other RAG baselines by score.

    What’s included

  • Files
  • - RAG/bench.py — retrieval-only benchmark runner.

    - RAG/README.md — instructions and options.

    - RAG/init.py.

  • Backends
  • - vec: FAISS dense (your vecRAG) via src/db_faiss.py (FaissDB.search()).

    - lex: simple lexical token-overlap baseline over concept_texts.

    - lightvec: LightRAG vector-only via src/adapters/lightrag/vectorstore_faiss.py.

  • Datasets
  • - self: query = concept_texts[i], label = its own position. Sanity check of pure vector retrieval.

    - cpesh: if artifacts/cpesh_cache.jsonl exists, query = CPESH probe (fallback: expected or concept), label = CPESH doc_id mapped into NPZ positions.

  • Metrics
  • - P@1, P@5, MRR@10, nDCG@10.

    - Mean/P95 latency per backend.

  • Dimensionality handling
  • - Automatically matches NPZ vector dimension:

    - 768D: concept embedding only.

    - 784D: 16D TMD + 768D concept (normalized) to align with IVF-IP.

    How to run

  • Install dependencies (LightRAG backends are optional):
  • bash

    pip install -r requirements.txt

  • Ensure artifacts exist:
  • artifacts/faiss_meta.json with an index_path to your FAISS index.
  • NPZ metadata (auto-detected via FAISS_NPZ_PATH or common artifact names).
  • Optional env:

    bash

    export FAISS_NPZ_PATH=artifacts/fw10k_vectors.npz # or your NPZ

    export FAISS_NPROBE=16 # IVF tuning

  • Examples:
  • bash

    Dense-only vecRAG baseline, self-retrieval

    python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec

    Add lexical baseline for reference

    python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lex

    If LightRAG installed, compare vector-only backends on the same FAISS index

    python RAG/bench.py --dataset self --n 1000 --topk 10 --backends vec,lightvec

    CPESH-driven evaluation (if artifacts/cpesh_cache.jsonl exists)

    python RAG/bench.py --dataset cpesh --n 1000 --topk 10 --backends vec,lex

  • Outputs:
  • Per-query results JSONL: RAG/results/bench_.jsonl
  • Summary Markdown: RAG/results/summary_.md
  • Notes

  • This is a retrieval-only harness (no LVM inference, no text generation) so vecRAG is isolated and fairly compared.
  • For a fair apples-to-apples comparison, lightvec uses the same FAISS index and NPZ metadata through LightRAGFaissVectorStore.
  • If your index is fused 784D, queries are constructed as [tmd_dense(16), concept(768)] and normalized for IP search.
  • Next options

  • Add BM25 or Elastic baseline if you want a stronger lexical comparator.
  • Add a separate “graph/hybrid” evaluation to compare LightRAG graph-enhanced retrieval vs dense-only vecRAG (kept separate since it uses extra signals).
  • Status

  • Implemented RAG-only benchmarking under RAG/. Ready to run and produce accuracy/latency comparisons for vecRAG vs baselines.
  • Related Research