Product Requirements Document (PRD)

Title

Vector‑Only Latent Nemotron‑Nano‑2 (Hybrid Mamba)

Version

v1.0 (formal draft)

Owners

Product/Research: Trent Carter & Partner (ChatGPT)

Tech Lead: TBD

ML/Infra: TBD

1) Executive Summary

We will adapt a Nemotron‑Nano‑2–style hybrid Mamba/Transformer into a vector‑only latent model. The core model ingests/produces sequences of latent vectors instead of text tokens. This preserves Mamba’s sequential inductive bias while removing a fixed token vocabulary. A cloud concept store (ANN cosine) provides fast text reconstruction via nearest neighbors; a vec2text fallback handles out‑of‑vocabulary (OOV) cases and bootstraps the store online.

Why now: Enables language‑free reasoning and concept‑native interfaces, while remaining interoperable with GTR‑T5 (768D) and vec2text (IELAB/JXE) for evaluation and text I/O.

2) Goals & Non‑Goals

Goals

Vector‑native I/O: Replace token embeddings with latent vectors (default 768D) and special control vectors (, , , ).

Long‑context learning: Achieve stable training with latent sequence length target L ≈ 8k via curriculum.

Robust decoding: ANN cosine decoding with vec2text fallback and online concept insertion.

Non‑collapse: Prevent representation collapse using AR/MLM/InfoNCE/VICReg + multi‑horizon.

Eval you mandated: BLEU‑4 & ROUGE‑L (Expected Text vs Output Text), and cosine between GTR‑T5(Expected Text) and model output vector.

Non‑Goals (v1)

Multimodal latent training; learned VQ codebooks at scale (optional later); RL alignment.

3) User Stories

R&D Engineer: “I can pass sequences of latent concept vectors, train at long context, and decode to text for inspection.”

Systems Engineer: “The concept store scales, supports online inserts, and provides deterministic, low‑latency nearest‑neighbor lookups.”

Evaluator: “I can run standardized BLEU/ROUGE and cosine metrics and compare runs over time.”

4) Functional Requirements

Latent Interface

- Input vectors z∈R768 (pluggable dim), unit‑norm; bridge to internal dim (default 512) with LayerNorm.

- Special control latents: trainable , , , .

- Output step: 768D unit‑norm vector per position.

Training‑Time Latentizer

- Deterministic clause/phrase segmentation (~17 words → 1 concept on average).

- Pack sequences up to L; prepend , append , pad.

- Generate two stochastic views per sequence for contrastive learning (paraphrase/noise/segment jitter).

Backbone & Heads

- Replace token embedding with bridge(768→512).

- Keep Mamba/Transformer blocks; minimal positional scheme (index‑based/rotary).

- Heads: (a) continuous next‑latent prediction; (b) optional multi‑horizon.

Decoding & OOV

- ANN cosine over concept store; if top‑1 cosine < τ, call vec2text; upsert (text, vector, meta) back to store.

Evaluation

- Text: BLEU‑4, ROUGE‑L between Expected Text and Output Text.

- Vector: cosine(GTR‑T5(Expected Text),z^) and angular error.

5) Non‑Functional Requirements

Stability: No collapse; rising InfoNCE accuracy; healthy batch variance.

Latency: p95 budget set and tracked with/without fallback.

Scale: Sharded ANN; online inserts ≤ 50 ms p95; read‑after‑write ≤ 1 s.

Reproducibility: Seeded segmentation; pinned model versions; manifest files.

Safety: Bounded generation length; deterministic fallback; observability.

6) Architecture

Input: Text → (optional) GTR‑T5 → 768D vectors → Latentizer packs sequences.

Core: Bridge 768→512 → Hybrid Mamba/Transformer → Heads predict next latent(s).

Output: 768D vector → ANN cosine decode → text; vec2text fallback if needed → store insert.

Data Flow (training): Expected Text → GTR‑T5 → z* (target vector) and concept sequence → model → z^ → metrics.

7) Data Strategy

Datasets: Use the same corpora as the reference model where allowed; ensure doc‑level split (80/10/10).

Segmentation: Punctuation + heuristics; optional small learned splitter; enforce target granularity (~17 words/concept).

Packing: Sliding windows with overlap; maintain doc boundaries for leakage control.

Normalization: Unit‑norm on all vectors pre/post bridge; mean‑free if needed.

8) Model Specification

Bridge: Linear(768→512) + LayerNorm + GELU (optional) + residual projection path.

Positional: Learned index bias or rotary on latent index.

Heads:

- AR next‑step: Predict xt+1 (512D), then project to 768D for metrics/decoding.

- Multi‑horizon: Additional heads for t+k, k∈{4,16}.

9) Objectives & Losses (Anti‑Collapse)

Autoregressive (AR): LAR=α⋅(1−cos⁡(x^,x))+(1−α)∥x^−x∥22, α=0.7.

Masked Latent Modeling (MLM): 15–30% random masks; L2/cos reconstruction.

Contrastive (InfoNCE): Two views per sequence; temperature τ=0.07; large in‑batch negatives.

VICReg/Barlow Twins: Variance floor γ=1.0 after unit‑norm; decorrelate off‑diagonals.

Scheduled Sampling: Introduce 10–30% free‑running over epochs.

Default weights: λAR=1.0,λMLM=0.5,λNCE=0.5,λVIC=0.04.

10) Training Protocol

Curriculum on L: 1k → 2k → 4k → 8k (advance only after stability checkpoints).

Phases:

- A: MLM + denoise @ L=1k

- B: +AR @ L=2k

- C: +InfoNCE + multi‑horizon @ L=4k

- D: Scheduled sampling; scale to L=8k

Batching: Target global batch ≈ 1000 sequences via grad accumulation.

Regularization: Dropout 0.1; stochastic depth 0.1–0.2; grad clip 1.0; EMA teacher for contrastive target.

Precision: BF16/FP16 mix; activation checkpointing; ZeRO/sharding as needed.

11) Inference & Decoding

Step Output: 768D unit‑norm vector.

ANN Decode: FAISS/HNSW/ScaNN; top‑K=32; fusion score s=λcos⁡+(1−λ) step_logit, λ=0.7.

Threshold τ: Start at 0.85; if top‑1 < τ → vec2text; cache and upsert new pair.

Caching: Per‑session LRU; prefetch neighbors around prior steps.

12) Evaluation (Acceptance Gates)

Text: BLEU‑4, ROUGE‑L between Expected Text and Output Text.

Vector: cosine(z∗,z^) and angular error; z∗=GTR‑T5(Expected Text).

Targets (v1): cosine avg ≥ 0.85 (p90 ≥ 0.80); BLEU‑4 ≥ 0.25; ROUGE‑L ≥ 0.45; fallback ≤ 10% and trending down.

No‑collapse: Per‑dim batch stddev ≥ γ; InfoNCE retrieval ↑; stable losses.

Reporting: Mean/median/std/95% CI; p50/p95 latency; OOV trend; domain slices.

13) Concept Store (ANN) Requirements

API: query(vector,K) → {id, cosine, text}; upsert(id?, vector, text, meta); get(id).

Index: Cosine; unit‑norm; HNSW for high recall or IVF‑PQ for memory tradeoffs.

Scale: Sharded by id/time; background reindex.

Consistency: Read‑after‑write ≤ 1 s.

Observability: Hit‑rate, τ‑failures, tail latency, growth, dedup stats.

14) Configuration Defaults

ParameterDefaultNotes Input dim768GTR‑T5 compatible Bridge dim512compressive bottleneck Context L1k → 2k → 4k → 8kcurriculum Global batch~1000 sequencesvia grad acc Mask ratio0.2MLM InfoNCE τ0.07temperature Cos threshold τ0.85fallback trigger ANN top‑K32re‑rank window Fusion λ0.7cos vs logit Dropout0.1 Stoch. depth0.1–0.2 Grad clip1.0

15) APIs (Sketch)

Training Latentizer

POST /latentize → {sequence_id, latents:[vec768...], meta}

Inference

POST /step → {vec_in_768} → {vec_out_768, score}

POST /decode → {vec_out_768} → {text, cosine, source}

Concept Store

POST /ann/query {vector, K} → {items:[{id, cosine, text}]}

POST /ann/upsert {id?, vector, text, meta} → {id}

16) Implementation Plan

Week 1: Latentizer v0; bridge; control vectors; tiny model @ L=1k trains with MLM+denoise.

Week 2: Add AR; cosine ≥ 0.80 on small eval; BLEU‑4 ≥ 0.20 / ROUGE‑L ≥ 0.40.

Week 3: Add InfoNCE + k‑head; L=2k/4k; integrate ANN decode (no fallback), dashboards.

Week 4: Add vec2text fallback + online inserts; L=4k; fallback ≤ 20%.

Week 5–6: Scale to L=8k; meet v1 gates; harden infra & docs; freeze test set.

Deliverables: Code modules (latentizer/bridge/model/heads/eval), ANN service, eval suite, runbooks, manifests.

17) Risks & Mitigations

RiskImpactMitigation Representation collapseTraining failsVICReg variance floor; InfoNCE w/ many negatives; multi‑horizon; EMA teacher; monitor stats Long‑context instabilityConvergence issuesCurriculum; LR schedule; checkpointing; grad clip High fallback rateLatency & costτ tuning; cache; online inserts; grow store; re‑rank top‑K Mis‑alignment 512↔768Metric dropCareful projections; unit‑norm; angular error monitoring ANN drift/dupesDecode errorsCosine+text hash dedup; periodic compaction; QA

18) Acceptance Criteria (Go/No‑Go)

Meets v1 targets (§12) on frozen test set.

Fallback ≤ 10% and dropping; p95 latency within budget.

No collapse indicators; stable training at L=8k.

Reproducible runs with seeded segmentation and pinned versions.

19) Extensions (Post‑v1)

VQ discrete latents (codebook entropy/usage regularizers).

Differentiable segmenter; Sequential‑GPS positions; TMD channelization.

Reranker over top‑K neighbors using internal scores.

20) Appendix

A. Metric Computation

Cosine: cos⁡(z∗,z^)=z∗⋅z^∥z∗∥ ∥z^∥; report angular error arccos⁡(cos⁡).

Text: sacreBLEU (BLEU‑4), ROUGE‑L.

B. Minimal Pseudocode (Training)

x768 = latentizer(text_batch) # [B,L,768], unit-norm
x = bridge(layernorm(x768)) # [B,L,512]
y = model(x) # predicts next and k-step latents
loss = L_ar + L_mlm + L_infoNCE + L_vic
loss.backward(); clip_grad_norm_(params,1.0)
opt.step(); opt.zero_grad()

C. Decoding Logic

neighbors = ann.query(vec_out_768, topk=32)
if neighbors[0].cosine >= tau:
 text = neighbors[0].text
else:
 text = vec2text(vec_out_768)
 ann.upsert(vector=vec_out_768, text=text, meta=...)

End of PRD v1.0

Product Requirements Document (PRD)

Product Requirements Document (PRD)

Title

Version

Owners

1) Executive Summary

2) Goals & Non‑Goals

Goals

Non‑Goals (v1)

3) User Stories

4) Functional Requirements

5) Non‑Functional Requirements

6) Architecture

7) Data Strategy

8) Model Specification

9) Objectives & Losses (Anti‑Collapse)

10) Training Protocol

11) Inference & Decoding

12) Evaluation (Acceptance Gates)

13) Concept Store (ANN) Requirements

14) Configuration Defaults

15) APIs (Sketch)

16) Implementation Plan

17) Risks & Mitigations

18) Acceptance Criteria (Go/No‑Go)

19) Extensions (Post‑v1)

20) Appendix

A. Metric Computation

B. Minimal Pseudocode (Training)

C. Decoding Logic

Related Research

INVERSE_STELLA: Product Requirements Document

Sequential Position Encoding with Semantic GPS: A→B→C→D→E Sequencing in 384D Latent Space

Vector-Only Latent Space LLM Component Analysis

Product Requirements Document: The Cloud Lexicon Architecture