TC
← All Research
# PRD_Addendum_LightRAG_LearningLLM_Enhanced_Metrics.md
PRDGeneral AI Theory

# PRD_Addendum_LightRAG_LearningLLM_Enhanced_Metrics.md

_(three enhancements aligned with PLMS/PAS/HMI)_

2025-11-073 min read640 words

# PRD_Addendum_LightRAG_LearningLLM_Enhanced_Metrics.md

_(three enhancements aligned with PLMS/PAS/HMI)_

Trent Carter

11/7/2025

A) LightRAG for Codebase (with Vector Manager agent)

Objective: maintain a continuously updated code+graph index for precise retrieval and dependency-aware planning. Scope (V1):
  • Ingest repo on vp new and on every commit (post-commit hook).
  • Build both semantic vectors and a call/import graph (tree-sitter/ctags).
  • Expose queries: where_defined(symbol)who_calls(fn)impact_set(file|symbol)nearest_neighbors(snippet).
  • Vector Manager agent owns refresh cadence, backfills, and integrity checks; surfaces drift warnings in HMI.
  • APIs:
  • rag.refresh(scope) → repo or subpath.
  • rag.query(kind, payload) → returns code locations + graph paths.
  • rag.snapshot() → writes rag_snapshot.json bound to git SHA for reproducibility.
  • KPIs / SLOs:
  • Index freshness ≤ 2 min from commit.
  • Query latency P95 ≤ 300 ms (local).
  • Coverage: ≥ 98% of files indexed; graph edges on par with static analyzer.
  • Risks:
  • Large monorepos → shard index per submodule; lazy load on demand.

  • B) Planner Learning LLM (project-experience model)

    Objective: reduce cost/time by training a planner-facing LLM on what worked (per lane, per provider, per topology), both locally (per project) and globally (portfolio). Data to learn from:
  • Task tree + assignments, lane ids, provider matrix, rehearsal outcomes, KPI passes/fails, budget runways, violations, rework counts.
  • Pipeline:
  • After completion: PLMS emits a planner_training_pack.json (sanitized).
  • Trainer agent fine-tunes or LoRA-adapts the Planner model (or updates a retrieval memory) with dual partitions: LOCAL(project) and GLOBAL(portfolio).
  • A/B validation: Re-run the same project template with the updated Planner (no human), compare units (time, tokens by type, cost, energy). Target ≥15% median improvement after 10 projects.
  • Serving:
  • Planner uses GLOBAL first, overlays LOCAL deltas if the repo/team matches.
  • Cold-start: fallback to default priors + CI bands.
  • KPIs / SLOs:
  • Estimation MAE% drops over time (goal: ≤20% at 10 projects).
  • Rework rate ↓, KPI violations ↓, budget overruns ↓.
  • Risks:
  • Calibration poisoning → include only (baseline|hotfix) ∧ validation_pass ∧ !sandbox.
  • Privacy → anonymize task text; keep only structured features when required.

  • C) Multi-Metric Telemetry & Visualization (incl. Energy/Carbon)

    Objective: track and visualize timetokens (input/output/tool-use/think), cost, and energy (estimated) separately; allow custom roll-ups per stakeholder. Data model (already compatible):
  • Extend receipts to log token breakdown per step.
  • Add energy estimator: E ≈ (GPU_kW × active_time) + (CPU_kW × active_time) with model-specific coefficients; store in receipts.
  • HMI
  • Stacked bars per task and per lane (time / token types / cost / energy).
  • Budget runway (already added) + carbon overlay for “green” stakeholders.
  • Compare runs: show percent deltas to prior baselines for the same project template.
  • APIs
  • GET /metrics?with_ci=1&breakdown=all → returns mean + CI for each metric and token subtype.
  • GET /compare?runA=…&runB=… → structured diff with significance flags.
  • KPIs / SLOs
  • Visualization latency ≤ 1s for recent projects.
  • Metrics completeness ≥ 99% of steps report all four classes.
  • Risks
  • Energy estimates imperfect → clearly label as estimated and show coefficient sources; allow per-cluster overrides.

  • Cross-cutting: how these three fit PLMS/PAS/HMI

  • PLMS: planning uses LightRAG to localize work; estimates borrow historical priors; rehearsal uses RAG to sample representative strata.
  • PAS: Vector Manager triggers re-index after artifact writes; KPI receipts attach RAG lookups used.
  • HMI: adds RAG search widgetdependency graph view (2D/3D), learning gains panel (units saved vs last baseline), and multi-metric bars with carbon overlay.

  • Open questions (call these in stand-up)

  • Rename “VP of Engineering”? Proposal: “Project Executive (PEX) Agent” to avoid org-role confusion.
  • Local model pack defaults? Decide exact SKUs and VRAM targets for Mac/PC.
  • Energy coefficients: Which GPU/CPU profiles to ship by default?
  • RAG indexer: prefer tree-sitter or ctags+custom? (I recommend tree-sitter for rich edges; ctags as fallback.)
  • Planner training cadence: every run vs nightly batch (I recommend nightly).
  • Related Research