TC
← All Research
DRH Dedicated Trainer
ReferenceGeneral AI Theory

DRH Dedicated Trainer

To make this crystal clear, here's an ASCII art diagram showing the high-level flow for training and inference. I'll use a simplified view of your pyramid model (768D input → compression → 384D nuclear → expansion → 768D output), with DRH attached to the nuclear bottleneck.

2025-08-044 min read654 words

To make this crystal clear, here's an ASCII art diagram showing the high-level flow for training and inference. I'll use a simplified view of your pyramid model (768D input → compression → 384D nuclear → expansion → 768D output), with DRH attached to the nuclear bottleneck.

ey Benefits of the Streamlined Script:

✅ Clean Output - No more verbose debug logs

✅ Focused Metrics - Only the 3 key metrics you need

✅ Quality Assessment - Automatic POOR/MODERATE/EXCELLENT rating

✅ Training Sample Detection - Automatically extracts sample count from checkpoint name

✅ Real Teacher Vectors - Uses actual gtr-t5-base sentences for realistic testing

✅ Consistent Testing - Same 5 test sentences every time for fair comparison

Expected Improvement Trajectory:

./venv/bin/python3 -m app.utils.drh_dedicated_trainer --samples 5000

./venv/bin/python3 tests/trace_drh_reconstruction.py inputs/projects/Project_V1p8_80325_768-384-512-384-ProjectionDRH.json --auto-checkpoint

  • Random/Untrained: Cosine similarity ~0.0 to -0.1 (🔴 POOR)
  • 500 samples: Target >0.1 cosine similarity
  • 1000 samples: Target >0.3 cosine similarity (🟡 MODERATE)
  • 5000 samples: Target >0.5 cosine similarity (🟢 EXCELLENT)
  • The script is now perfectly optimized for measuring DRH training benefit! You can run it before and after each training session to track improvement quantitatively.

    Training Flow (Generating Pairs and Training DRH)

    text

    +---------------+ +-----------------+ +-------------------+ | Input Data | | Generate Pairs | | Train Model | | (Triplets | | (20K Samples) | | (Full Model incl. | | .json) | | | | DRH) | +---------------+ +-----------------+ +-------------------+ | | | | | 1. Load triplets.json | 2. Load Project.json | | 2. Encode anchor/negative | 3. Create model w/ DRH | | text to 768D teacher | 4. Load existing .pth | | vectors (GTR-T5-base) | (if fine-tuning) | | 3. Run forward pass on | 5. For each epoch: | | model to get 384D | - Forward pass on | | nuclear vectors | pairs to get recon | | 4. Save pairs: [nuclear, | - Compute MSE + Cosine | | teacher, negative] | + Contrastive Loss | | in .pt file | - Backward & Update +---- (Run once) ---------->| | - Save updated .pth | | (single file w/ DRH) +-------------------------------+ | v +------------------+ | Trained Checkpoint | | (.pth w/ DRH) | +------------------+
  • Step 1 (Generate Pairs): One-time run to create drh_training_pairs.pt from triplets.json (anchor as positive teacher, negative as negative teacher). This uses GTR-T5-base for teacher vectors and the model's forward pass for nuclear vectors. No DRH involved yet.
  • Step 2 (Train): Load the Project.json to create the model (with DRH), load an existing .pth if fine-tuning, and train on the pairs. The DRH learns to reconstruct nuclear (384D) to teacher-like 768D. Save a single .pth that includes the full model + trained DRH weights.
  • Inference Flow (Using the Trained DRH)

    text

    +---------------+ +-----------------+ +--------------------+ | Input Text | | Encode to 768D | | Run Full Model | | (e.g., SciQ | | (GTR-T5-base or | | (Load .pth w/ DRH) | | question) | | cloud API) | | | +---------------+ +-----------------+ +--------------------+ | | | | | | 1. Compress to 384D nuclear | | | 2. Pass nuclear through DRH | | | to get reconstructed 768D | | | 3. Use reconstructed 768D for +---- (Inference query) --->| | cloud lookup or vec2text | | +-------------------------------+ | v +------------------+ | Output Text | | (from cloud match)| +------------------+
  • Inference: Load the trained .pth (which has DRH), encode input text to 768D (GTR-T5-base or cloud API), run through the model to get nuclear output, then through DRH to get reconstructed 768D. Query your cloud 10M+ table with the reconstructed vector for the closest text match (no ground truth needed).
  • This way, there's only one checkpoint (.pth) at the end—it's the full model with DRH trained. The "upgraded DRH training" is just focusing on optimizing that component using the contrastive loss to fix the low cosine issue.

    Related Research