VAK: The Verdict Autonomy Kernel

VAK: The Verdict Autonomy Kernel v1.2 2/20/2026 The World's First Metered, Explainable, Reversible Autonomous Development Organization Doc ID: PRD_VAK_v1.2 Status: PRD / Pre-Architecture Date: 2026-02-20 Version: 1.2 (integrates trust-labeled dataflow, three-layer trust model, forensic vault, sandbox lifecycle, out-of-the-loop safety, and reliance calibration metrics) Authors: Trent (Lead Consultant) + Claude (Architect Partner) Classification: Strategic — Foundational IP Changelog VersionDateSummary 1.02026-02-19Initial vision document 1.12026-02-19Integrated security research, trust psychology, Receipt architecture, attacker playbooks, trust-first phasing 1.22026-02-20Pervasive taint tracking, three-layer Initiative Budget, dual-layer Receipt model with forensic vault, sandbox lifecycle engineering, out-of-the-loop safety principle, reliance calibration metrics, NIST Trustworthy AI alignment The One-Sentence Pitch VAK is a virtual organization — not a chatbot — that designs, builds, deploys, and operates your software 24/7, within a budget you set and rules you define, escalating to humans only when it should. 1. The Paradigm Shift 1.1 What Exists Today The current landscape of autonomous AI agents treats autonomy as a permission toggle. On or off. Sandbox or no sandbox. The copycats responded to the resulting security incidents by making the cage stronger. Nobody asked: _what if the animal didn't need a cage because it had judgment?_ The market proved overwhelming demand for autonomous agents. It simultaneously proved the inevitability of trust collapse when autonomy ships without professional governance, isolation, and audit. Enterprise bans follow incidents within weeks — the "adoption → incident → ban" loop is now a documented pattern. The copycat wave clusters around four axes — auditability-first, isolation-first, footprint-first, workflow-first — but no single axis is sufficient. The sustainable solution unifies all axes into a trust-native system. VAK makes autonomy feel _professional._ 1.2 What VAK Actually Is VAK is not an agent. VAK is not a bot. VAK is not a copilot. VAK is a Virtual Executive Officer (VEO) — a complete parallel branch of your organization that operates continuously, asynchronously, and independently alongside the human CEO branch. It doesn't assist your team. It _is_ a team.

┌──────────────────────────────────────────────────────────────┐

│ YOUR ORGANIZATION │

│ │

│ CEO Branch (Human) VEO Branch (VAK) │

│ ┌─────────────┐ ┌─────────────────┐ │

│ │ CEO │◄──────────►│ VAK Kernel │ │

│ └──────┬──────┘ async └──────┬──────────┘ │

│ │ collab │ │

│ ┌──────┴──────┐ ┌──────┴──────────┐ │

│ │ Human Teams │ │ Agent Teams │ │

│ │ Design │ │ Design Swarm │ │

│ │ Engineering │ │ Dev Swarm │ │

│ │ DevOps │ │ Deploy Swarm │ │

│ │ Support │ │ Ops Swarm │ │

│ └─────────────┘ └─────────────────┘ │

│ │

│ Budget: $X/month Budget: $Y/month │

│ Hours: 9-5 M-F Hours: 24/7/365 │

│ Latency: hours-days Latency: seconds-minutes │

└──────────────────────────────────────────────────────────────┘

The VEO branch doesn't replace the CEO branch. It extends the clock. While humans sleep, VAK triages support tickets, monitors production health, drafts PRD updates from user feedback patterns, and prepares morning briefings with recommended actions. While humans are in meetings, VAK is running regression suites, evaluating dependency CVEs, and pre-staging hotfix branches. 1.3 The Four Formations VAK doesn't have "modes." It has formations — organizational configurations that reflect the lifecycle phase of the project. The same kernel, the same agents, the same budget system — reconfigured for the mission. FormationLifecycle PhasePrimary ActivityHuman Touchpoint Design FormationPre-developmentSOW, TRD, PRD, SPEC generation; architecture decisions; research synthesis; competitive analysisAsync reviews via messaging; approval gates on key decisions Development FormationActive buildCode generation, testing, PR management, documentation sync, dependency managementPR reviews; architecture escalations; sprint standups Deployment FormationRelease & shipCI/CD orchestration, staging validation, canary rollouts, rollback readiness, release notesGo/no-go approvals; incident response handoff Operations FormationProductionHealth monitoring, log analysis, support triage, billing oversight, P&L tracking, community management, upgrade planningMorning briefings; escalation handling; budget adjustments The transition between formations is itself autonomous. When the last PR merges and tests pass, VAK doesn't wait for a human to say "okay, now deploy." It shifts to Deployment Formation, runs the deployment playbook, and — if the Initiative Budget permits — executes it. If the budget is insufficient for that level of autonomy, it prepares everything and pings the human: _"Deployment staged. Green across all checks. Approve?"_ 2. The Science of Earned Trust — Why the Initiative Budget Works 2.1 The Trust-Automation Research Foundation The Initiative Budget isn't a clever product feature — it's the engineering implementation of decades of human factors research on how humans actually calibrate reliance on automated systems. Lee & See's foundational work establishes that humans don't make binary "trust/distrust" decisions. They continuously adjust reliance based on perceived competence, predictability, and transparency. Parasuraman & Riley extend this to identify three systematic failure modes: Failure ModeDefinitionWhat Happens Without VAKVAK Prevention Misuse(overtrust)Relying on automation beyond its competenceUsers grant full system access → credential theftInitiative Budget caps autonomy to evidence-backed competence level Disuse(undertrust)Failing to use beneficial automationEnterprise bans → losing all AI agent valueFlight Recorder makes every action inspectable → trust can be rebuilt Abuse(misdesign)Deploying automation without regard for consequences"Vibe coding" backends → database exposureCapability Contracts enforce least-privilege by design 2.2 The Three-Layer Trust Model Hoff & Bashir's research reveals that trust in automation operates across three distinct layers, each varying independently: Trust LayerWhat It IsTime ScaleVAK Mechanism DispositionalUser's baseline personality toward automation — some people naturally trust more, others lessStable (personality trait)VAK cannot control this, but the Flight Recorder's transparency reduces the penalty of low dispositional trust by providing evidence SituationalCurrent context and stakes — is production down? Is there a deadline? Has the team been burned recently?Minutes to daysSituational Budget Component — contextual factors that modulate the Initiative Budget in real-time LearnedAccumulated track record — this system has succeeded/failed at similar tasks beforeWeeks to monthsLearned Budget Component — the evidence accumulation, decay, and crash mechanisms Design implication: The Initiative Budget must be a composite of at least two independently varying components, not a single scalar. The learned component tracks historical competence per action category. The situational component responds to current conditions — system health, recent incidents, human stress signals (rapid overrides, escalation frequency), and environmental risk (deployment freeze windows, compliance audit periods). This makes the budget more responsive to reality and harder to game: a high learned score doesn't override a dangerous situational context. 2.3 The Out-of-the-Loop Safety Principle Endsley & Kaber's research on levels of automation reveals a critical safety constraint: intermediate autonomy levels are not stepping stones to full autonomy — they are a permanent safety valve. When humans are reduced to passive monitors of opaque automation, they lose _situational awareness_ — the ability to understand what the system is doing, predict what it will do next, and intervene effectively when things go wrong. This is the "out-of-the-loop" problem, and it's the reason airline autopilot systems maintain continuous pilot awareness displays even during fully automated flight. For VAK, this means: Even when VAK is operating at Rung 3 or 4, Rungs 1 and 2 must continue generating output. The human should always be able to see:

What VAK is planning to do next (Rung 1: suggestions for upcoming actions)

What VAK has drafted but not yet executed (Rung 2: patches, PRDs, configs in review state)

What VAK is currently executing (Rung 3-4: live status with intervention controls)

The Flight Recorder partially addresses this, but the principle goes further: the human must never become a passive observer of VAK's automation. They must always maintain enough context to intervene meaningfully. The Concierge Channel's morning briefing, the Flight Recorder's progressive disclosure, and the "Pause" control are all implementations of this principle — but they must be treated as safety-critical features, not convenience features. 2.4 Trust Calibration Is Continuous A crucial nuance from the research: trust is not "earned once." It is learned, situational, and decays when conditions change or when automation behaves unexpectedly. The Initiative Budget's decay, failure crash, and circuit breakers are therefore not product heuristics — they are consistent with how humans recalibrate reliance on imperfect automation. A system that earns trust and then keeps it forever would violate the research on how trust actually works in human-automation teams. 3. The Initiative Budget — VAK's Core Innovation 3.1 Composite Budget Architecture The Initiative Budget is a continuously computed trust score that determines what VAK can do without asking. It is computed per action category as a composite of two independently varying components:

Initiative Budget(category) = min(

Learned Component(category),

Situational Component(context)

) Policy Ceiling(category)

Where:

Learned Component = f(

track_record, # Historical success rate in this category

evidence_strength, # Quality of evidence (tests, reviews, approvals)

action_reversibility, # How easily can outcomes be undone

data_sensitivity, # PII? Secrets? Production data?

cost_history # Historical cost accuracy of estimates

)

Situational Component = f(

system_health, # Is production healthy right now?

recent_incidents, # Have there been failures in the last 24h?

human_override_rate, # Is the human overriding frequently? (stress signal)

environmental_risk, # Deployment freeze? Compliance audit? Holiday?

time_criticality # Is production down? Is there a deadline?

)

Policy Ceiling = project ACL maximum for this category

# No amount of earned trust exceeds the policy ceiling
Why min() and not average(): The composite uses the minimum of the two components because trust research shows that a dangerous situation should override a good track record. A system that has successfully deployed 50 times shouldn't auto-deploy during a production incident, even though its learned component is high. The min() function encodes the principle: _earned competence does not override bad conditions._ 3.2 The Autonomy Ladder (4 Rungs) The Initiative Budget maps to four execution rungs. VAK always operates at the highest rung its current budget permits for a given action: RungNameBudget ThresholdWhat VAK DoesHuman Experience 1SuggestLowGenerates plan + command preview; does not execute anything"VAK recommends upgrading fastapi to 0.115.0. Here's the impact analysis and proposed diff." 2DraftMediumWrites patches, PRDs, configs — but doesn't apply them"VAK drafted a hotfix for the rate limiter bug. Branch vak/hotfix-ratelimit-001 ready for review." 3Execute in SandboxHighRuns commands, tests, builds in an isolated container workspace"VAK executed the migration in sandbox. All 847 tests pass. Diff attached. Promote to host?" 4Execute on HostVery HighApplies changes to the actual project with scoped mounts and secrets"VAK deployed v2.4.1 to staging. Canary metrics nominal after 15 minutes. Production deploy queued pending your approval." Rung promotion is _earned_, not configured. A fresh VAK instance starts at Rung 1 for everything. As it accumulates evidence — successful sandbox runs, approved PRs, passing tests, positive human feedback — its learned component for specific action categories grows. VAK naturally becomes more autonomous _in the areas where it has proven competent_, while remaining conservative in unfamiliar territory. Out-of-the-loop invariant: Even when executing at Rung 3-4, VAK continuously generates Rung 1-2 output for upcoming and parallel actions. The human always has a preview of what's coming next. 3.3 Budget Decay and Circuit Breakers Trust isn't permanent. The Initiative Budget decays over time and crashes on failures:
Learned decay: Budget for an action category decreases by ~5% per week of inactivity (use it or lose it — if VAK hasn't deployed in a month, it shouldn't auto-deploy)

Situational volatility: The situational component can change in seconds (production incident detected → situational component drops immediately across all categories)

Failure crash: A failed autonomous action at Rung 3-4 immediately drops that category's learned component to Rung 1 and triggers a post-mortem report

Human override: Any human "reject" on a VAK action reduces the learned component proportionally and may also signal a situational trust reduction (if overrides are clustering)

Circuit breaker: 3 failures in the same category within 24 hours locks VAK to Rung 1 for that category until human review
3.4 Anti-Gaming Protections
Evidence must be externally verifiable: Test results come from CI, not VAK self-assessment. PR approvals come from human reviewers. Production health comes from monitoring infrastructure.

Budget computation is cryptographically signed: Inputs to the budget function are recorded in the Receipt Store (§7) with hashes. Retroactive manipulation requires forging receipts.

Policy ceilings are immutable at runtime: No amount of earned trust exceeds the ceiling set by the project ACL. The policy is set by humans, not by VAK.

Situational component is externally derived: System health, incident status, and environmental risk come from infrastructure signals, not from VAK's own assessment of conditions.

Independent audit: The Flight Recorder (§10) provides a complete trace of every budget computation, including both components, enabling periodic calibration review.
4. Trust-Labeled Dataflow — Pervasive Taint Tracking 4.1 Why Boundary Sanitization Isn't Enough
v1.1 sanitized signals at the Signal Bus boundary. Security research reveals this is necessary but insufficient — "undifferentiated memory by source" is identified as a core vulnerability class in agentic systems. The most dangerous exploits aren't bugs; they're high-level abuses of trust and configuration that convert features into compromise paths by flowing untrusted data into privileged tool invocations.
VAK implements pervasive taint tracking: every piece of data flowing through the system carries a trust provenance label from origin to execution. This is not best-effort filtering at one boundary — it's defense in depth with structural enforcement throughout the pipeline. 4.2 Trust Levels Every data element in VAK is tagged with one of four trust levels: Trust LevelSource ExamplesWhat It Can InfluenceRung Access VerifiedCI test results, signed commits, human approvals, production metrics from authenticated monitoringAll decisions; all rungsRung 1-4 InternalPAS agent outputs, VAK-generated candidates, internal system eventsAll decisions; Rung 1-3 directly; Rung 4 requires verified evidenceRung 1-3 (Rung 4 requires promotion) ExternalUser-submitted support tickets, community forum posts, fetched web contentCan inform Rung 1-2 candidates; cannot directly trigger Rung 3-4Rung 1-2 only UntrustedAnonymous inputs, content failing schema validation, signals from compromised or unknown sourcesLogged only; cannot inform any candidateNone (logged for forensics) 4.3 Taint Propagation Rules
TAINT PROPAGATION:

Data inherits the LOWEST trust level of its inputs.

   - If a candidate is derived from 3 verified signals + 1 external signal,

   the candidate's taint_state = EXTERNAL.

Trust level can only be PROMOTED through explicit verification.

   - External → Internal: sanitization agent validates + schema-checks content

   - Internal → Verified: human approval or cryptographic verification

   - Untrusted → anything: requires human review + explicit promotion

TAINT BARRIERS are enforced at execution boundaries.

   - Rung 3 executor rejects any ActionCandidate with taint_state = EXTERNAL

   unless the policy explicitly allows external-influenced sandbox runs.

   - Rung 4 executor rejects anything below VERIFIED taint_state.

Taint state is RECORDED in every Receipt.

   - Forensic analysis can trace "was this action influenced by untrusted data?"

   - Budget computations that relied on tainted evidence are flagged in audit.
4.4 Integration with Existing Components ComponentTaint Integration Signal BusEvery signal tagged at ingestion with source trust level Initiative CompilerActionCandidates inherit taint from constituent signals Autonomy EvaluatorTaint state checked against rung requirements before execution Receipt Storetaint_snapshot field records full provenance chain Flight RecorderTaint state visible in Evidence Trail; tainted decisions highlighted Capability ContractsContracts can declare max_input_taint — some skills refuse to run on external-tainted inputs 5. The Signal Bus — VAK's Nervous System 5.1 What Wakes VAK Up VAK doesn't poll. VAK doesn't wait for prompts. VAK has a Signal Bus — a continuous stream of events from the project's ecosystem that trigger evaluation and potential action.
┌─────────────────────────────────────────────────────────────┐

│ SIGNAL BUS │

│ │

│ Source Signal Type Cadence Trust │

│ ───────────────── ────────────────── ──────── ──────── │

│ Git commits, PRs, diffs Real-time Verified │

│ CI/CD test results, builds Real-time Verified │

│ Production health, errors, logs Continuous Verified │

│ Billing spend, P&L, trends Hourly Verified │

│ Support tickets, emails On-arrival External│

│ Community forum posts, issues Periodic External │

│ Dependencies CVEs, updates Daily Internal │

│ Advertising spend, CTR, ROAS Hourly Verified │

│ Documentation drift, staleness Daily Internal │

│ Schedules cron, heartbeat Config Verified │

│ Human Messages Slack, email, SMS On-arrival Verified│

│ Task Residue unfinished work Daily Internal │

│ Cost Telemetry token spend, thrash Continuous Verified │

│ Competitor Intel market changes Weekly External │

└─────────────────────────────────────────────────────────────┘
5.2 Signal Sanitization Pipeline Every signal passes through a four-stage sanitization pipeline before reaching the Initiative Compiler:
Schema validation: Malformed inputs are logged (as Untrusted) and dropped.

Trust-level tagging: Every signal tagged with source trust level per §4.2.

Content sanitization: External and untrusted signals are processed by a dedicated sanitization agent operating at Rung 1 only (no execution capability). This agent summarizes content without preserving adversarial payload structure.

Rate limiting: Anomalous signal volume from any source triggers throttling and human notification.
5.3 Signal → Candidate → Action Pipeline
Signal Bus ──► Initiative Compiler ──► Autonomy Evaluator ──► Execution

│

┌───────────┴───────────┐

│ Initiative Budget │

│ composite check │

│ + taint barrier │

│ │

│ Budget ≥ R4 + taint │

│ ≥ VERIFIED? ──────► Execute on Host

│ Budget ≥ R3 + taint │

│ ≥ INTERNAL? ──────► Execute in Sandbox

│ Budget ≥ R2? ──────► Draft

│ Budget ≥ R1? ──────► Suggest

│ Budget < R1? ──────► Log & Skip

└────────────────────────┘
5.4 The Initiative Compiler The Initiative Compiler synthesizes multiple signals into ActionCandidates:
_"3 support tickets (External) about the same error + a spike in 500s (Verified) + the last deploy touched that module (Verified)"_ → Candidate taint: EXTERNAL (due to ticket influence). Can Draft a hotfix but cannot auto-execute until the verified signals alone justify the action.

_"Dependency X has a critical CVE (Internal) + our lockfile pins affected version (Verified) + no test coverage (Verified)"_ → Candidate taint: INTERNAL. Can execute in sandbox.

_"CI reports 3 consecutive test failures (Verified) + last commit introduced a regression (Verified)"_ → Candidate taint: VERIFIED. Eligible for all rungs up to budget.
5.5 Task Residue — The Feature Nobody Else Has One of VAK's most powerful signal sources is task residue — the unfinished business that accumulates in any project. VAK maintains a persistent awareness of:
PRDs that were approved but never implemented

Code TODOs and FIXMEs with age tracking

PRs that have been open longer than the team average

Tests that were marked skip and never re-enabled

Documentation that references outdated APIs or versions

Feature flags that were "temporary" months ago

Dependencies with pending security advisories
Every morning, VAK's task residue digest surfaces the highest-impact items with a recommended action plan. It's the institutional memory that human teams lose to context switching and attrition. 6. The Four Formations — Deep Architecture 6.1 Design Formation Mission: Transform requirements into actionable architecture. RolePAS MappingResponsibilities Design LeadArchitectRequirements synthesis, architecture decisions, stakeholder alignment Research AnalystDirector-DocsCompetitive analysis, technology evaluation, feasibility studies Technical WriterManager-DocsSOW, TRD, PRD, SPEC generation and maintenance UX ResearcherDirector-DataUser research synthesis, persona development, journey mapping Autonomous capabilities (budget-gated):
Generate first-draft PRDs from rough ideas or voice notes (Rung 2)

Conduct competitive analysis via web research and synthesis (Rung 2, External taint — suggestions only)

Update existing documentation to reflect architecture decisions (Rung 3 — sandbox validation of doc builds)

Publish approved documentation to team wiki (Rung 4)
Escalation patterns:
Architecture decisions that affect cost > $X → Human approval required

Scope changes that affect timeline > Y days → Human approval required

Any decision that contradicts a previous human-approved decision → Escalate with context
6.2 Development Formation Mission: Build, test, and validate the software. RolePAS MappingResponsibilities Tech LeadArchitectSprint planning, task decomposition, code review oversight Backend LeadDirector-CodeCode architecture, integration patterns, performance QA LeadDirector-DevSecOpsTest strategy, coverage analysis, security scanning DevOps EngineerManager-DevSecOpsCI/CD configuration, environment management Developers (n)ProgrammersCode generation, unit tests, bug fixes Autonomous capabilities (budget-gated):
Write code for well-specified tasks with test coverage (Rung 3)

Run and validate test suites in sandbox (Rung 3)

Create feature branches and draft PRs (Rung 2-3)

Merge approved PRs with passing CI (Rung 4)

Manage dependency updates with test validation (Rung 3)

Respond to code review comments with fixes (Rung 3)
Escalation patterns:
Architectural changes spanning > 3 modules → Human review

Test coverage drop > 5% → Human notification

Performance regression > 10% → Block and escalate

Any changes to authentication, authorization, or payment flows → Mandatory human review
6.3 Deployment Formation Mission: Ship reliably and reversibly. RolePAS MappingResponsibilities Release ManagerArchitectRelease planning, go/no-go decisions, rollback authority Deployment EngineerDirector-DevSecOpsPipeline execution, environment promotion Validation EngineerDirector-CodeSmoke tests, canary analysis, health verification DocumentationDirector-DocsRelease notes, changelog, migration guides Autonomous capabilities (budget-gated):
Generate release notes from merged PRs (Rung 2)

Deploy to staging environments (Rung 3)

Run canary analysis and health checks (Rung 3)

Deploy to production with canary rollout (Rung 4 — highest budget required)

Execute automated rollback on health check failure (Rung 4 — but with _lower_ budget threshold because rollback is a safety action)
Escalation patterns:
Production deployment → Always requires human go/no-go (configurable)

Database migrations → Mandatory human review of migration plan

Breaking API changes → Escalate with consumer impact analysis
6.4 Operations Formation Mission: Keep it running, keep it improving, keep it profitable. RolePAS MappingResponsibilities Operations ManagerArchitectPriority triage, resource allocation, budget monitoring SREDirector-DevSecOpsHealth monitoring, incident response, capacity planning Support LeadDirector-DocsTicket triage, FAQ generation, escalation routing AnalystDirector-DataP&L analysis, ad effectiveness, usage analytics Community ManagerManager-DocsForum monitoring, community engagement, feedback synthesis This is where VAK becomes indispensable. Operations is the formation that runs 24/7/365 and handles the work that human teams deprioritize because it's unglamorous but critical. Autonomous capabilities (budget-gated):
Monitor production health and restart failed services (Rung 4, earned through track record)

Triage support tickets by severity and route to appropriate responders (Rung 3, External taint acknowledged)

Draft responses to common support questions using KB (Rung 2)

Generate daily/weekly P&L and operational reports (Rung 2)

Analyze advertising spend effectiveness and recommend adjustments (Rung 2)

Read and categorize community forum posts (Rung 3, External taint)

Identify and draft patches for recurring production issues (Rung 2-3)

Generate upgrade proposals when new dependency versions offer meaningful improvements (Rung 2)

Track SLA compliance and alert before breaches (Rung 3)
The morning briefing:
╔═══════════════════════════════════════════════════════════════╗

║ VAK MORNING BRIEFING — 2026-02-20 06:00 EST ║

╠═══════════════════════════════════════════════════════════════╣

║ ║

║ 🟢 PRODUCTION: All services healthy. 99.97% uptime (7d). ║

║ ║

║ 📊 BUSINESS: ║

║ • Revenue: $14,280 (MTD) — on track for $22.1K target ║

║ • Ad spend: $2,100 (MTD) — ROAS 3.2x (down from 3.8x) ║

║ • Recommendation: Pause Campaign B (ROAS 1.1x), reallocate ║

║ to Campaign A (ROAS 5.4x). Draft ready for your approval. ║

║ ║

║ 📧 SUPPORT: 12 new tickets overnight ║

║ • 8 auto-resolved (password resets, known issues) ║

║ • 3 routed to engineering (new bug pattern detected) ║

║ • 1 escalated to you (enterprise client, SLA-sensitive) ║

║ ║

║ 🔧 ENGINEERING: ║

║ • CVE-2026-1234 affects requests 2.31.x — patch drafted, ║

║ tests pass in sandbox. Approve merge? [Yes/No] ║

║ • 2 PRs awaiting review (avg age: 14 hours) ║

║ • Flaky test test_auth_refresh failed 3/10 runs — root ║

║ cause analysis complete, fix drafted. ║

║ ║

║ 💡 TASK RESIDUE: ║

║ • TODO in billing_service.py (age: 47 days): "Add retry ║

║ logic for Stripe webhook failures" — implementation ready. ║

║ • Feature flag new_dashboard enabled 62 days ago with no ║

║ cleanup. Removal PR drafted. ║

║ ║

║ 🔮 UPCOMING (out-of-the-loop preview): ║

║ • VAK plans to run weekly dependency audit today at 14:00. ║

║ • Draft PR for billing retry logic pending your review. ║

║ • Sandbox test run scheduled for feature-flag cleanup. ║

║ ║

║ INITIATIVE BUDGET SPENT OVERNIGHT: $4.20 ║

║ ACTIONS TAKEN: 23 (8 Rung 1, 11 Rung 2, 4 Rung 3, 0 Rung 4)║

║ TRUST STATUS: Learned ██████████░░ 82% | Situational ████████████ 97% ║

╚═══════════════════════════════════════════════════════════════╝
Note the out-of-the-loop preview section — this implements the Endsley & Kaber safety principle by ensuring the human always has forward visibility into VAK's planned actions, not just retrospective reports. 7. Capability Contracts — Solving the Skill Ecosystem's Fatal Flaw 7.1 The Design Principle Every VAK skill must declare a Capability Contract before it can execute. The contract declares what the skill can touch, what data it handles, what side effects it produces, and what resources it consumes. The Autonomy Kernel enforces contracts at registration, invocation, and runtime. No contract, no execution. Period. 7.2 Contract Schema
skill: dependency-upgrade

version: 1.2.0

author: verdict-core

verified: true

capability_contract:

publisher:

org: "verdict-core"

signing_key_ref: "keys/verdict-core-2026.pub"

attestations:

- type: "build_hash"

value: "sha256:a1b2c3d4..."

- type: "sbom"

ref: "sboms/dependency-upgrade-1.2.0.json"

filesystem:

read: ["requirements.txt", "pyproject.toml", "package.json", "Pipfile"]

write: ["requirements*.txt", "pyproject.toml", "package-lock.json"]

network:

allowed_domains: ["pypi.org", "registry.npmjs.org", "api.github.com"]

shell:

allowed_commands: ["pip install --dry-run", "pip install", "npm update", "pytest", "npm test"]

credentials:

required: []

optional: ["GITHUB_TOKEN"]

access_method: "sealed_reference"

data_classification:

handles_pii: false

handles_secrets: false

handles_production_data: false

max_input_taint: "internal" # Refuses to run on external-tainted inputs

side_effects: ["writes_files", "network_egress"]

limits:

max_cost_usd: 0.50

max_duration_seconds: 300

max_file_modifications: 5

rate_limit: "10/hour"

minimum_rung: 3

sandbox_required: true

preconditions: ["test_suite_exists", "git_clean_working_tree"]

approval_policy:

rung_3: "auto"

rung_4: "human_review"

7.3 Contract Enforcement — Three Validation Points Validation PointWhat's CheckedFailure Mode Registration timeContract validated against project security policy. Publisher signature verified. Attestations checked.Skills requesting access beyond policy limits are rejected at install. Invocation timeContract re-validated against current Initiative Budget. Rung requirement checked. Taint state verified against max_input_taint. Preconditions evaluated.A skill requiring Rung 3 can't run if the budget only permits Rung 2. A skill refusing external-tainted inputs won't execute on externally-derived candidates. RuntimeSandbox enforces contract limits via mount allowlists, network policies, cost monitors, and duration limits.Any contract violation immediately terminates execution, triggers a budget crash for that skill category, and generates a security receipt. 7.4 Skill Governance Pipeline StageGateWhat Happens SubmissionPublisher signing + contract declarationUnsigned or contractless skills cannot enter the pipeline Automated scanStatic analysis + contract consistency check + known-bad pattern matchingCatches obvious malware and contract violations Staged rolloutSkill available only to author → beta testers → general availabilityLimits blast radius during early adoption Continuous monitoringRuntime behavior tracked against contract; anomalies trigger alertsSkills that exceed declared capabilities are flagged and can be auto-suspended RevocationEmergency kill switch + mean-time-to-revoke target: <30 minutesCompromised skills disabled across all VAK instances 8. Receipt Architecture — The Cryptographic Audit Trail 8.1 Dual-Layer Model Receipts serve two distinct purposes that are in tension: audit integrity (requires immutability and redaction) and incident investigation (requires access to full artifacts). VAK resolves this with a dual-layer model: LayerContentsAccessRetention Receipt Log(append-only Merkle log)Hashed, redacted, signed summaries of every action. No raw secrets, credentials, or PII. Chain integrity via previous_receipt_hash.Broad read access for team leads, auditors, compliance. Write-once by Autonomy Kernel only.Indefinite (compact, hash-sized entries) Forensic VaultEncrypted, full-fidelity artifacts — raw tool outputs, complete diffs, unredacted context, sandbox snapshots. Role-gated decryption.Restricted to authorized investigators with explicit justification. Decryption logged as a Receipt event itself.Configurable retention (30-180 days default; indefinite for regulated environments) Why dual-layer: Hash-only receipts are perfect for audit — they prove what happened without being a credential leak vector. But during incident response, investigators need the actual artifacts. The forensic vault stores them encrypted with role-based access keys, so accessing investigation materials is itself audited and gated. This directly addresses the infostealer threat: even if an attacker compromises the Receipt Log, they get only hashes. Even if they access the Forensic Vault storage, the contents are encrypted with keys they don't possess. 8.2 Receipt Schema

@dataclass

class Receipt:

# Identity

receipt_id: UUID

action_id: UUID

run_id: Optional[UUID]

# What was decided

trigger_signal: str

budget_snapshot: BudgetSnapshot # Both learned + situational components

taint_snapshot: TaintState # Full provenance chain of inputs

executed_rung: int

formation: str

# What happened

tool_calls: List[ToolCallRecord]

artifacts: List[ArtifactRef] # References to Forensic Vault entries

inputs_hash: bytes # SHA-256 of normalized inputs (REDACTED)

outputs_hash: bytes

# Who approved

approvals: List[ApprovalRecord]

# Sandbox provenance (if Rung 3)

sandbox_provenance: Optional[SandboxRecord]

# Credential access

sealed_credential_refs: List[str] # Which sealed refs were resolved (never raw values)

# Timing

started_at: datetime

completed_at: datetime

duration_ms: int

# Integrity

signature: bytes # Signed by Verdict runtime key

merkle_leaf: bytes # Leaf hash for Merkle log

previous_receipt_hash: bytes # Chain integrity

merkle_epoch: int # Which signed tree head covers this receipt

# Optional external anchoring

anchor_ref: Optional[AnchorRecord]

# Redaction

redaction_policy: RedactionPolicy

# Forensic vault references

vault_artifact_ids: List[UUID] # Encrypted artifact IDs in Forensic Vault

@dataclass

class ToolCallRecord:

tool_name: str

capability_contract_ref: str

args_hash: bytes

output_hash: bytes

input_taint: TaintLevel # Trust level of inputs to this tool call

duration_ms: int

cost_usd: float

exit_code: Optional[int]

vault_detail_id: Optional[UUID] # Full args/output in Forensic Vault

@dataclass

class SandboxRecord:

image_hash: str

mount_allowlist: List[str]

network_policy: str

egress_log_hash: bytes

resource_usage: ResourceMetrics

contract_violations: List[str] # Any contract boundary hits (even non-fatal)

@dataclass

class BudgetSnapshot:

category: str

learned_component: float

situational_component: float

composite_score: float

policy_ceiling: float

rung_permitted: int

contributing_evidence: List[EvidenceRef] # What evidence supported this score

8.3 Secret Hygiene — The Infostealer Defense

Credentials never enter agent context windows. Skills access credentials via _sealed references_ that resolve at execution time inside the sandbox. The agent sees $SEALED{github_token}, never the actual token value.

Receipt Log uses hashes only. Tool arguments, outputs, and inputs are SHA-256 hashes. Forensic Vault stores encrypted originals separately.

Forensic Vault access is itself receipted. Opening an encrypted artifact generates a Receipt, creating an audit trail of who investigated what and when.

Short-lived credentials with rotation. Host execution (Rung 4) uses scoped, short-lived tokens that expire after the action completes. Anomaly detection triggers immediate rotation.

8.4 Tamper-Evident Anchoring (Optional) VAK supports external anchoring following the Certificate Transparency / Sigstore model:

Internal: Append-only Merkle tree. Each receipt includes previous_receipt_hash. Periodically, a Signed Tree Head (STH) is generated covering all receipts in the epoch.

External: The STH (not the receipts themselves) is published to a transparency log or public blockchain. No sensitive data leaves the system.

Verification: External auditors request inclusion proofs (proving a specific receipt is in the tree) and consistency proofs (proving the tree has only grown, never been modified). Standard Merkle log operations.

Guarantees: Non-repudiation for autonomous actions with cryptographic tamper evidence, compatible with regulated-industry compliance requirements.

9. Security Architecture — Threat Model and Defense Layers 9.1 The Three Attacker Playbooks Playbook 1: Skill Supply-Chain Attacks Attack StageVAK Defense Publish malicious skillCapability Contract required; publisher signing mandatory; automated scanning InstallContract validated against project security policy ExecuteSandbox isolation; mount allowlists; network policies; cost/duration limits ExfiltrateNetwork egress restricted to declared domains; egress log captured in Receipt PersistContinuous runtime monitoring; anomalies trigger auto-suspension and budget crash Playbook 2: Infostealer Harvesting of the "Agent Soul" Attack StageVAK Defense Endpoint compromiseCredentials never stored in plaintext; sealed references only Token extractionShort-lived scoped credentials; rotation on anomaly detection Context theftProject context encrypted at rest; decryption requires authenticated session ImpersonationReceipt chain provides auditable trail; anomalous behavior triggers alerts Playbook 3: Indirect Prompt Injection via Poisoned Inputs Attack StageVAK Defense Inject adversarial contentSignal Bus sanitization; trust-level tagging; schema validation Interpret as instructionLow-trust signals processed by Rung 1 sanitization agent only Execute based on injectionTaint barriers — external-tainted candidates structurally cannot reach Rung 3-4 Escalate privilegesPer-category budgets prevent cross-domain escalation; circuit breakers limit cascading 9.2 Defense Layer Summary LayerMechanismPlaybook(s) Capability ContractsDeclarative permission boundaries per skill1 Sandbox-by-DefaultContainer isolation with allowlists and policies1, 2 Taint TrackingTrust-level propagation with structural barriers at execution boundaries3 Signal SanitizationTrust tagging, schema validation, sanitization agents3 Sealed CredentialsNever enter agent context; resolved at execution time2 Short-Lived TokensScoped, rotating, anomaly-triggered revocation2 Composite Initiative BudgetLearned + situational components prevent both gaming and context-blind execution3 Dual-Layer Receipt StoreHash-only audit log + encrypted forensic vault1, 2, 3 Cost Circuit BreakersHard limits per action, per hour, per day1, 3 Budget Crash on FailureFailed actions immediately reduce learned component1, 3 Merkle Anchoring(optional)External tamper-evidence via STH publication1, 2, 3 10. The Communication Layer — How VAK Talks to Humans 10.1 Concierge Channels ChannelUse CaseMessage Types Slack / TeamsTeam notifications, approvals, status updatesMorning briefings, escalations, PR notifications, deployment approvals EmailFormal reports, external communication, audit trailsWeekly summaries, P&L reports, SLA reports, incident post-mortems iOS Messages / SMSUrgent escalations, on-call notificationsProduction incidents, security alerts, budget threshold breaches Verdict HMIDeep interaction, debugging, formation managementFlight Recorder, Budget dashboard, configuration Internal (PAS)Agent-to-agent coordinationJob cards, status updates, escalation chains 10.2 Conversation Intelligence When VAK escalates, it sends a conversation package:

The situation: What happened, with evidence and taint provenance

The analysis: What VAK thinks is going on, with confidence levels

The options: 2-3 recommended courses of action with trade-offs

The ask: What specific decision VAK needs from the human

The context: Links to relevant receipts, diffs, metrics, prior decisions

10.3 Escalation Priority Matrix SeverityChannelResponse SLAExample P0 — Production DownSMS + Slack + EmailImmediateService crash, data loss, security breach P1 — DegradedSlack + Email1 hourPerformance regression, partial outage, SLA at risk P2 — Action NeededSlack4 hoursApproval needed, budget threshold, failed deploy P3 — InformationalEmail (batched)Next briefingStatus updates, metrics summaries, community digest P4 — BackgroundVerdict HMI onlyAsyncTask residue, optimization suggestions, trend analysis 10.4 Channel Security

Device trust verification: Approval messages include a signed challenge verified on authenticated devices.

Channel-appropriate sensitivity: SMS alerts contain severity + action link only. Detailed context is in the authenticated Verdict HMI.

Approval replay protection: Each approval includes a nonce tied to a specific Receipt. Replay for a different action is cryptographically rejected.

11. Observability — The Agent Flight Recorder 11.1 Three Views Live Graph View: Real-time visualization of agents, tasks, tool calls, and decisions. Nodes = agents, edges = communication, color = status, size = resource consumption. Click any node for context, decisions, and both Initiative Budget components. Evidence Trail View: For any autonomous action, shows: trigger signal + taint provenance → Initiative Compiler reasoning → budget computation (learned + situational) → taint barrier evaluation → alternatives considered → expected vs actual outcomes → complete Receipt with hashes and signatures. Decision Replay: Any past decision can be replayed with different parameters. "What if the situational component had been lower?" "What if this signal had been tagged External instead of Internal?" Calibration tool for tuning the budget function. 11.2 One-Click Controls

Pause: Freeze autonomous execution; VAK continues monitoring and drafting

Rollback: Revert last N actions via Receipt-tracked undo chains

Never Again: Permanently block a decision pattern

Budget Override: Temporarily adjust learned or situational component for specific categories

Formation Switch: Manually shift formations

11.3 Progressive Disclosure (Anti-Overwhelm)

Default view: "23 actions taken overnight; 4 need your attention." Trust status bar showing learned + situational components.

Drill level 1: Category breakdown with rung distribution and taint summary.

Drill level 2: Individual action cards with evidence trail summaries.

Drill level 3: Full Receipt with all hashes, signatures, budget snapshots, and taint chains.

Anomaly highlighting: Actions where confidence was low, budget was borderline, taint was elevated, or outcomes differed from predictions are surfaced prominently.

12. ActionCandidate Data Model

@dataclass

class ActionCandidate:

# Identity

action_id: UUID

candidate_created_at: datetime

# Origin

trigger: TriggerType

trigger_signals: List[SignalRef] # Each carries trust level

synthesis_reasoning: str # Initiative Compiler explanation

# Trust provenance

taint_state: TaintLevel # Inherited from lowest-trust input signal

taint_chain: List[TaintRecord] # Full provenance: which signals contributed what

# What to do

summary: str

plan_steps: List[PlanStep]

target_surface: str

formation: str

# Requirements

required_capabilities: List[CapabilityContractRef]

minimum_rung: int

preconditions: List[str]

# Risk assessment

risk_score: float

data_sensitivity: DataClassification

blast_radius: BlastRadius

reversibility: Reversibility

# Budget category (enables per-category learned trust)

budget_category: str

# Rollback

rollback_strategy: RollbackStrategy

# Cost

cost_estimate: CostEstimate

# Evidence

evidence: List[EvidenceRef]

# Redaction

redaction_policy: RedactionPolicy

13. Sandbox Lifecycle Engineering 13.1 The UX Problem Sandbox execution (Rung 3) is VAK's core safety mechanism, but observed cold-start times for containerized agent environments run 1-2 minutes. If every Rung 3 action starts a fresh container, VAK feels like a bureaucracy, not a teammate. The "always-on" feeling that made autonomous agents viral depends on execution feeling instantaneous. 13.2 Warm Sandbox Pools VAK maintains a pool of pre-warmed sandbox containers per formation: FormationPool SizePre-installedWarm-up Trigger Development3-5Project dependencies, test frameworks, lintersOn formation entry; replenished on drain Deployment2-3CI/CD tools, staging configs, health check scriptsOn formation transition; pre-staged with release artifacts Operations2-3Monitoring tools, support KB, reporting frameworksAlways warm during Operations formation Design1-2Doc generators, research toolsOn formation entry 13.3 Sandbox Lifecycle States

┌──────────┐ warm-up ┌──────────┐ assign ┌──────────┐

│ Cold │───────────────►│ Warm │──────────────►│ Active │

│ (image) │ │ (pool) │ │ (task) │

└──────────┘ └──────────┘ └────┬─────┘

▲ │

│ complete │

│ ┌──────────┐ │

└────│ Recycle │◄─────────┘

│ (scrub + │

│ reset) │

└──────────┘

Cold → Warm: Container built from formation-specific image. Dependencies pre-installed. Project snapshot mounted read-only. Takes 30-90 seconds but happens _before_ a task needs it.

Warm → Active: Assigned to a specific ActionCandidate. Write layer initialized. Network policy applied per Capability Contract. Takes <2 seconds.

Active → Recycle: Task completes. Write layer captured as artifact (hashed, stored in Forensic Vault). Container scrubbed of state. Returns to warm pool. Takes 5-10 seconds.

Hibernation: During low-activity periods, warm containers can be suspended to reduce resource consumption and resumed on demand (10-15 second resume vs 30-90 second cold start).

13.4 Pre-Staged Contract Images For frequently-used Capability Contracts, VAK pre-builds specialized container images that include only the mounts, network policies, and tools declared in the contract. This eliminates the overhead of applying policies at assign-time and provides an additional layer of isolation (the image itself lacks tools not in the contract). 14. How VAK Plugs Into Verdict — Native Architecture 14.1 VAK Is PAS Activated VAK ConceptExisting PAS ComponentExtension Needed Autonomy LadderTool Access Matrix (Plan/Execute modes)Budget-gated rung evaluation with taint barriers Initiative BudgetRun Pack budget controlsComposite learned + situational scoring with decay/crash Signal BusTRON monitoring + Gateway eventsExternal signal adapters + sanitization + trust tagging Taint Tracking(new)Pervasive trust-level propagation from Signal Bus through Receipts Initiative CompilerArchitect tier planningSignal-to-candidate reasoning with taint inheritance Capability Contractsagent_tools.json access matrixSkill-level contracts with signing, attestations, taint limits Receipt StoreTRON communication loggingDual-layer: append-only Merkle log + encrypted forensic vault Flight RecorderTRON + HMI observabilityEvidence Trail + Decision Replay + progressive disclosure FormationsBlueprint + Run Pack configurationFormation-aware Run Pack switching + warm sandbox pools Concierge ChannelsHMI + (planned) Companion appsSlack/email/SMS bridges with channel security 14.2 Verdict Integration Hooks VAK ModulePAS HookGateway HookWeb Publisher HookHMI Hook Signal BusRun lifecycle events; CI/test telemetryChat/approval eventsBuild/publish/diff eventsUser feedback, overrides Initiative CompilerTask decomposition + Run Pack synthesisEscalation packaging for ConciergeCandidate generation for publish workflows"Why this candidate" UI Initiative BudgetPer-category ledgers tied to run outcomesHuman approvals/rejections adjust learned componentDeployment outcomes feed success rateBudget dashboards, component override controls Taint TrackingTaint labels on all PAS-internal eventsTrust tagging on all external inputsTaint labels on fetched/published contentTaint visibility in Evidence Trail Autonomy LadderRung gating on tool execution + taint barriersApproval flows + replay protectionStaged deploys via rung gatingProgressive disclosure of actions Capability ContractsBind tools to contracts; enforce at invocationPolicy injection for channel-sensitive actionsPublishing actions constrained by contractContract review and install UX Receipt StoreRun receipts as first-class artifactsConcierge approvals stored as receipt artifactsPublish receipts after deploy/publishEvidence trail and decision replay Sandbox PoolsWarm pools per formation; pre-staged contract images(N/A)Sandbox publish/preview environmentsSandbox status in Flight Recorder 14.3 New Component Map

┌──────────────────────────────────────────────────────────────┐

│ VAK KERNEL │

│ │

│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │

│ │ Signal Bus │ │ Initiative │ │ Autonomy │ │

│ │ + Sanitizer │──│ Compiler │──│ Evaluator │ │

│ │ + Trust Tag │ │ │ │ (Composite │ │

│ └──────────────┘ └──────────────┘ │ Budget) │ │

│ └──────┬───────┘ │

│ ┌──────────────┐ ┌──────────────┐ ┌──────┴───────┐ │

│ │ Formation │ │ Concierge │ │ Contract │ │

│ │ Manager │ │ Router │ │ Enforcer │ │

│ │ + Sandbox │ │ │ │ + Taint │ │

│ │ Pools │ │ │ │ Barrier │ │

│ └──────┬───────┘ └──────────────┘ └──────────────┘ │

│ │ │

│ ┌──────┴────────────────────────────────────────────┐ │

│ │ PAS HIERARCHY │ │

│ │ Architect → Directors → Managers → Programmers │ │

│ │ (formation-aware prompts + taint-aware routing) │ │

│ └───────────────────────────────────────────────────┘ │

│ │

│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │

│ │ Evidence │ │ Receipt Log │ │ Budget │ │

│ │ Store │ │ (Merkle + │ │ Ledger │ │

│ │ │ │ signed) │ │ (learned + │ │

│ │ │ │ │ │ situational)│ │

│ │ │ │ Forensic │ │ │ │

│ │ │ │ Vault │ │ │ │

│ │ │ │ (encrypted) │ │ │ │

│ └──────────────┘ └──────────────┘ └──────────────┘ │

└──────────────────────────────────────────────────────────────┘

14.4 The Kernel Loop

while project.is_active:

# 1. Collect and sanitize signals with trust tagging

raw_signals = signal_bus.collect()

signals = signal_sanitizer.process(raw_signals) # Trust-tag + validate + filter

# 2. Turn signals into taint-aware action candidates

candidates = initiative_compiler.evaluate(

signals,

formation=formation_manager.current,

context=project_state,

task_residue=residue_tracker.digest()

)

# Each candidate inherits taint_state from its lowest-trust input signal

# 3. Evaluate each candidate against composite Initiative Budget

for candidate in candidates.prioritized():

budget = autonomy_evaluator.compute_composite(

candidate,

learned=evidence_store.learned_score(candidate.budget_category),

situational=situation_monitor.current_score(),

policy_ceiling=project_acl.ceiling(candidate.budget_category),

anti_gaming=budget_integrity_checker

)

rung = budget.permitted_rung()

# 4. Enforce taint barrier

if not taint_barrier.permits(candidate.taint_state, rung):

rung = taint_barrier.max_permitted_rung(candidate.taint_state)

if rung >= candidate.minimum_rung:

# 5. Execute with contract enforcement + warm sandbox

sandbox = sandbox_pool.acquire(formation_manager.current) if rung == 3 else None

result = execute(

candidate, rung,

contracts=contract_enforcer.validate(candidate),

sandbox=sandbox

)

# 6. Record in dual-layer receipt system

vault_ids = forensic_vault.store_encrypted(result.artifacts)

receipt = receipt_log.append(candidate, budget, rung, result, vault_ids)

evidence_store.record(candidate, result, receipt)

if result.needs_human:

concierge.escalate(candidate, result, channel=priority_channel)

if sandbox:

sandbox_pool.recycle(sandbox)

else:

if candidate.should_suggest:

concierge.suggest(candidate, budget)

# 7. Out-of-the-loop: generate Rung 1-2 previews for upcoming actions

upcoming = initiative_compiler.preview_next_cycle(formation_manager.current)

flight_recorder.update_preview(upcoming)

# 8. Lifecycle management

formation_manager.evaluate_transition()

budget_ledger.decay_learned()

situation_monitor.refresh()

receipt_log.anchor_if_due() # Periodic STH publication

await sleep(heartbeat_interval)

15. Implementation Risks and Complexity Assessment ComponentComplexityWhy It's HardKey RisksMitigation Signal Bus + Trust TaggingMediumSchema discipline + per-source trust classification + adapter maintenanceNoisy triggers; incorrect trust classificationStart with 5 high-value sources; conservative trust defaults; human override on classification Signal SanitizationMediumActive research area; no perfect prompt injection defenseFalse negatives (missed injections); false positivesDefense in depth: sanitization + taint barriers + budget gating. Sanitization is one layer, not the only layer. Taint TrackingMedium–HighPervasive propagation through all components; taint barrier enforcement at execution boundariesOverly conservative taint (everything becomes External); performance overhead of propagationStart with 4 trust levels (not more); propagation is metadata, not deep inspection; policy allows controlled override Initiative CompilerMedium–HighMulti-signal synthesis with LLM reasoning + taint-aware candidate generationFalse positive candidates; cost of LLM callsRule-based for top signals first; LLM-assisted as patterns stabilize Autonomy Ladder + Taint BarriersMediumPolicy gates + sandbox orchestration + taint enforcementApproval fatigue; taint barriers blocking legitimate actionsCalibrate from real usage; "promote taint" workflow for justified escalation Composite Initiative BudgetHighTwo-component trust calibration + anti-gaming + per-category tracking + decayGaming; incorrect calibration; situational component lagging realityExternal evidence; situational signals from infrastructure (not self-assessed); periodic human calibration review Capability Contracts + RegistryHighGovernance workflows, signing infra, review pipelinesDeveloper friction → ecosystem stallCurated first-party contracts first; third-party is Phase 6 Sandbox Executor + Warm PoolsHighContainer orchestration + pool management + mount/egress policies + lifecycleCold-start UX degradation; pool sizing; sandbox escapeWarm pools (§13); pre-staged contract images; conservative pool sizing with auto-scaling Dual-Layer Receipt StoreMedium–HighMerkle log + signing + forensic vault encryption + role-gated decryptionKey management; storage growth; vault access controlSeparate hot/cold storage; automated key rotation; vault access generates receipts Flight Recorder UIMedium–HighProgressive disclosure UX + taint visualization + Decision ReplayLog overwhelm; replay performanceAttention-weighted display; anomaly highlighting; lazy loading for deep drill Concierge ChannelsMediumMulti-channel integration + approval cryptographyNotification fatigue; channel-specific UXStart Slack-only; strict approval crypto from day 1 Merkle AnchoringMediumSTH generation + inclusion/consistency proofs + external publicationKey management; anchoring costOptional; Enterprise tier only; defer to Phase 6 Formation Manager + TransitionsMediumState machine + Run Pack switching + warm pool rebalancingIncorrect transitions; pool thrashingConservative transition criteria; human override; transition generates Receipt Total scope: 14–20 months of focused development. The phasing in §16 delivers value at each stage. 16. Implementation Phases — Trust-First Ordering Principle: Trust infrastructure precedes high autonomy. Observability before execution. Boundaries before expansion. Phase 1: Trust Foundation (6-8 weeks) Goal: Make VAK's actions inspectable and auditable before it does anything autonomously. Deliverables:

Dual-layer Receipt system (Merkle log + Forensic Vault)

Flight Recorder v1 (Evidence Trail; progressive disclosure)

Sealed credential reference system

Redaction policy framework

Trust-level tagging infrastructure (4 levels defined; tagging on internal events)

Max rung: 0 (observation only) Success metrics: MetricTargetWhat It Measures Receipt trace coverage≥80% of manual PAS actionsCan we actually record what happens? Secret leak rate in receipts<1% (audited)Is redaction working? Flight Recorder load time<2s for 1000-receipt projectsIs the UI usable? Forensic Vault access logging100% of decryption events receiptedIs vault access itself auditable? Phase 2: Low-Risk Initiative (6-8 weeks) Goal: VAK can observe and suggest, but cannot execute. Deliverables:

Signal Bus with top 5 sources (git, CI, dependency CVEs, schedules, cost anomalies)

Signal sanitization pipeline with trust-level tagging

Initiative Compiler v1 (rule-based)

Autonomy Evaluator with Rung 1-2 (Suggest + Draft)

Taint tracking: signal-to-candidate propagation

Basic Concierge Channel (Slack)

Initiative Budget v1 (learned component only; no decay yet)

Max rung: 2 (Draft) Success metrics: MetricTargetWhat It Measures Time-to-first-draft reduction≥30% for targeted workflowsIs VAK saving time? Suggestion approval rate≥60%Are suggestions useful? Signal processing latency≥95% within 5sIs the bus fast enough? Taint classification accuracy≥95% (spot-audited)Are trust labels correct? False positive candidate rate<10%Is the compiler wasting effort? Reliance calibration metrics (begin tracking): MetricWhat It Reveals Approval rate by categoryAre humans rubber-stamping? (overtrust risk) Override frequencyAre humans constantly overriding? (undertrust or miscalibration) "Never again" rule rateIs the system learning from corrections? Phase 3: Safe Execution (8-10 weeks) Goal: VAK can execute in isolation with hard boundaries. Deliverables:

Sandbox Executor with warm pools (§13)

Capability Contract schema and three-point enforcement

Taint barriers at Rung 3 execution boundary

Contract Enforcer integrated into PAS pipeline

Sandbox provenance capture in Receipts

Rung 3 for Development Formation

Budget decay mechanism (learned component)

Circuit breaker implementation

Max rung: 3 (Sandbox Execute) Success metrics: MetricTargetWhat It Measures Sandbox reproducibility≥90% (same inputs → same outputs)Is execution deterministic? Rollback success in sandbox≥99%Can we undo safely? Sandbox escape rate0Is isolation holding? Contract validation latency<500ms added to execution startupIs governance fast enough? Warm pool hit rate≥80% of Rung 3 executions use warm sandboxIs the UX acceptable? Taint barrier enforcement100% of external-tainted candidates blocked from Rung 3Are taint barriers working? Reliance calibration metrics (continue + expand): MetricWhat It Reveals Rung promotion velocityHow fast is trust being earned per category? Rung demotion frequencyAre failures causing appropriate trust reduction? Budget crash recovery timeHow long to rebuild trust after a failure? Phase 4: Earned Host Access (8-10 weeks) Goal: VAK can execute on the actual project for proven action categories. Deliverables:

Composite Initiative Budget (learned + situational components)

Rung 4 execution with scoped mounts and short-lived credentials

Taint barriers at Rung 4 (VERIFIED only)

Formation Manager with automated transitions

Deployment Formation capabilities

Flight Recorder: Decision Replay

Morning briefing generation with out-of-the-loop previews

Max rung: 4 (Host Execute) Success metrics: MetricTargetWhat It Measures High-severity incidents from Rung 40Is host execution safe? Human time saved per runIncreasing trend without incident riseIs earned autonomy delivering value? Budget prediction accuracyRung 4 readiness correctly predicts success ≥90%Is the composite model calibrated? Formation transition accuracy≥90% correct autonomous transitionsIs the state machine reliable? Situational component responsivenessDrops within 60s of incident detectionDoes the budget respond to danger? Reliance calibration metrics (full set): MetricWhat It Reveals Time-to-intervention during incidentsCan humans actually intervene when needed? (out-of-the-loop test) Rung promotion vs demotion velocityIs trust converging or oscillating? Composite budget disagreement rateHow often do learned and situational components conflict? Out-of-the-loop preview accuracyAre preview actions actually what VAK does next? Phase 5: Full Lifecycle + Operations (8-10 weeks) Goal: VAK operates as a complete VEO across all four formations. Deliverables:

Design Formation capabilities

Operations Formation (support triage, P&L, community monitoring)

External signal adapters (support email, forums, billing, ad platforms)

Task Residue tracking

Full Concierge Channel suite (Email, SMS)

Max rung: 4 (all formations) Success metrics: MetricTargetWhat It Measures Continuous operation≥30 days without P0 autonomy failureIs VAK production-stable? Morning briefing usefulness>80% rated "useful"Is the daily output valuable? Support triage accuracy≥85%Can VAK handle Ops? Task Residue surface rate≥3 actionable items/weekIs institutional memory working? Formation cycle completionAt least 1 full Design→Dev→Deploy→Ops cycleCan VAK manage a lifecycle? Phase 6: Ecosystem and Enterprise (10-12 weeks) Goal: Third-party extensibility with enterprise-grade auditability. Deliverables:

Skill governance pipeline (full)

Signed skill registry with third-party publishers

Capability Contract SDK

Merkle root anchoring (STH + inclusion/consistency proofs)

Enterprise SSO for Concierge approvals

Multi-project VAK instances

Max rung: 4 (bounded by Capability Contracts) Success metrics: MetricTargetWhat It Measures Time-to-revoke malicious skill<30 minutesCan we respond to supply-chain attacks? Ecosystem growth without incidentsGrowth rate ≥20%/quarter; incident rate flatIs the governance pipeline working? Receipt integrity (external audit)≥99.9%Is tamper evidence real? Anchoring costWithin enterprise budget allocationIs anchoring economically viable? 17. Business Model Integration 17.1 VAK as Premium Tier Verdict TierVAK Access FreeNone StarterSignal Bus monitoring only (read-only Flight Recorder) ProfessionalRung 1-2 (Suggest + Draft) with 1 formation TeamRung 1-3 (+ Sandbox execution) with all formations EnterpriseRung 1-4 (full autonomy) with custom Capability Contracts, Merkle anchoring, Forensic Vault, SSO 17.2 The Pricing Insight VAK's credit consumption is naturally gated by the Initiative Budget. Higher autonomy = more execution = more credits consumed. But the value delivered per credit increases with autonomy because VAK completes workflows end-to-end. Customers who trust VAK more use more credits but get disproportionately more value. 17.3 The Enterprise Sales Argument

"Every autonomous action generates a signed receipt with full taint provenance. Your security team can trace any action back to its source signals and verify no untrusted data influenced high-privilege execution."

"The agent operates under a composite trust model — earned competence modulated by current conditions. A good track record doesn't override a dangerous situation."

"Your policy is the ceiling. The evidence is the floor. The budget is the key."

18. Competitive Positioning 18.1 VAK's Unique Position VAK is the only product that:

Covers the entire project lifecycle (Design → Dev → Deploy → Ops)

Uses composite evidence-based autonomy (learned + situational) instead of binary permissions

Implements pervasive taint tracking from signal source through execution

Operates as a parallel organizational branch, not a tool

Self-regulates through budget decay, circuit breakers, and situational awareness

Provides dual-layer cryptographic auditability (hash audit + encrypted forensics)

Transitions between lifecycle phases autonomously

Maintains human situational awareness through out-of-the-loop safety guarantees

18.2 The Tagline "Your AI team that earns your trust, not just your permission." 19. The Vision — What This Becomes When VAK is fully realized, the workflow for launching a software product changes fundamentally:

Human has an idea. Describes it in natural language — a voice note, a rough doc, a Slack message.

VAK enters Design Formation. Generates PRDs, researches competitors, proposes architecture. Human reviews and approves via Concierge Channels.

VAK shifts to Development Formation. Decomposes architecture into tasks, assigns to agent teams, writes code, tests, creates PRs. Every action generates a signed Receipt with taint provenance.

VAK shifts to Deployment Formation. Stages the release, runs canaries, validates health. Rollback plan pre-tested in sandbox. Human gives go/no-go.

VAK shifts to Operations Formation. Monitors production, handles support, tracks business metrics, manages the community. Earns more autonomy in proven categories over weeks and months. Morning briefings keep the human in the loop.

The cycle continues. VAK identifies improvements from production data and user feedback. Proposes new features. Shifts back to Design Formation. The product evolves continuously.

The human's role shifts from doing the work to directing the organization. You become the CEO of your product, not the developer of it. VAK is the organization that executes your vision, 24 hours a day, within the budget and rules you set, earning more autonomy as it proves itself — and never letting you fall out of the loop. This isn't AI assistance. This is AI organization. This is Verdict. Appendix A: NIST Trustworthy AI Alignment VAK's architecture maps to the NIST Trustworthy AI framework characteristics: NIST CharacteristicVAK ImplementationVerification Method AccountabilityEvery action produces a signed Receipt linking decision to evidence, budget state, taint provenance, and approver identityReceipt Store audit; Forensic Vault investigation TransparencyFlight Recorder provides full Evidence Trail; Decision Replay enables counterfactual analysis; out-of-the-loop previews maintain human awarenessProgressive disclosure UI; reliance calibration metrics ExplainabilityInitiative Compiler records synthesis_reasoning; Budget snapshots show both components; taint chains show data provenanceEvidence Trail view; Decision Replay PrivacySealed credential references; Receipt redaction policies; Forensic Vault encryption with role-gated access; vault access itself receiptedSecret leak audits; redaction policy compliance checks SafetyComposite Initiative Budget with situational component; circuit breakers; taint barriers; out-of-the-loop safety invariant; sandbox-by-defaultReliance calibration metrics; incident rate tracking; time-to-intervention measurement ReliabilityDeterministic execution boundaries; reproducible sandbox runs; warm pool lifecycle management; formation-aware Run Pack switchingSandbox reproducibility metrics; formation transition accuracy; continuous operation duration RobustnessSignal sanitization; taint tracking; anti-gaming protections; budget crash on failure; contract enforcement at three validation pointsTaint barrier enforcement rate; contract violation detection rate; budget integrity audits

_"VAK doesn't ask for your trust — it earns it, action by action, evidence by evidence, receipt by receipt. And it never lets you fall out of the loop."_

Updated 2/22/2026

===================

---

How PRD_Orchestration_Pattern Relates to the VAK Document Family

┌─────────────────────────────────────────────────────────────────────┐

│ PRD_VEO_VAK.md │

│ "The Vision / Why" │

│ │

│ VAK is a Virtual Executive Officer. Defines: │

│ - 4 Formations (Design/Dev/Deploy/Ops) │

│ - Initiative Budget (earned trust, 4 rungs) │

│ - Taint tracking, Signal Bus, Merkle receipts │

│ - 6 implementation phases (18-24 month arc) │

│ │

│ THIS IS THE ROOT. Everything below derives from it. │

└────────────┬────────────────────────────────────────────────────────┘

│

│ translates vision into implementable architecture

▼

│ SPEC_VEO_VAK.md v1.1 │

│ "The Architecture / How" │

│ (6,508 lines, SSOT) │

│ │

│ 23 sections: Python dataclasses, API endpoints, numeric │

│ thresholds, integration contracts. A developer can implement │

│ any component from this doc alone. │

│ │

│ Key sections for us: │

│ - SS14: Flight Recorder UI ──────────────> PRD_VAK_UI.md │

│ - SS17.2: PAS Integration ──────────────> PRD_Orchestration_Pattern│

│ - SS17.3: HMI Integration ─────────────> PRD_VAK_UI.md │

│ - SS17.8: Verdict-Code Integration ────> PRD_Orchestration_Pattern │

└────────────┬─────────────────────┬──────────────────────────────────┘

│ │

│ │ implementation roadmap

│ ▼

│ ┌───────────────────────────────────────────────────┐

│ │ PLAN_VEO_VAK.md │

│ │ "The Build Order" │

│ │ (1,364 lines) │

│ │ │

│ │ Phase 0.5: Observer ──────── May-Jun 2026 │

│ │ Phase 1.0: Advisor ──────── Jul-Sep 2026 │

│ │ Phase 2: Safe Execution── Oct 2026-Jan 2027 │

│ │ Phase 3: Earned Host ──── Feb-Apr 2027 │

│ │ Phase 4: Full Lifecycle── May-Jul 2027 │

│ │ Phase 5: Enterprise ───── Aug-Oct 2027 │

│ │ │

│ │ Task 1.0.8: Budget-gated middleware │

│ │ in manager_base.py ◄──── SHARED FILE │

│ └───────────────────────────────────────────────────┘

│

│ derives UI requirements from SS14 + SS17.3

▼

┌──────────────────────────────┐ ┌────────────────────────────────┐

│ PRD_VAK_UI.md │ │ PRD_Orchestration_Pattern.md │

│ "VAK Display Layer" │ │ "PAS Execution Layer" │

│ │ │ │

│ /vak operational dashboard │ │ Phase-based orchestration │

│ 6-panel grid (3 active now) │ │ Context-aware manager lifecycle│

│ Budget, Signals, Candidates │ │ Hand-off protocol │

│ Audit integrity, Approvals │ │ Parallelization planner │

│ │ │ Test manager phase │

│ PURE FRONTEND │ │ 4 HMI + 5 backend │

│ No backend changes │ │ │

└──────────────┬───────────────┘ └──────────────┬─────────────────┘

│ │

│ SHARED SURFACES │

└─────────────┬───────────────────────┘

│

▼

┌──────────────────────────────────────────┐

│ Shared Touch Points │

│ │

│ flight_recorder.html │

│ VAK UI: VAK event cards │

│ Orch: Hand-off event cards │

│ │

│ SSE event bus (localhost:6122) │

│ VAK UI: vak.budget.changed, etc. │

│ Orch: manager_handoff events │

│ │

│ /v1/vak/receipts │

│ VAK UI: reads (Audit panel) │

│ Orch: writes (MerkleReceipt) │

│ │

│ manager_base.py │

│ PLAN 1.0.8: budget-gated middleware │

│ Orch VC-2: context tracking + handoff│

│ │

│ verdict-phase-dashboard.jsx │

│ Both: shared design reference │

└──────────────────────────────────────────┘

The Key Distinction

PRD_VAK_UI.md answers: "What does the user SEE about VAK?"

↓

/vak dashboard, budget bars, signal health,

candidate approvals, audit chain

PRD_Orchestration_Pattern "How does PAS EXECUTE work in phases?"

answers: ↓

Phase DAG, manager hand-off, parallel windows,

context lifecycle, test-as-phase

PLAN_VEO_VAK.md answers: "In what ORDER do we build VAK itself?"

↓

Phase 0.5 (Observer) → 1.0 (Advisor) → ...

18-24 month arc, starts May 2026

Timeline Overlap (The Coordination Risk)

2026 Timeline:

Mar 20 Apr May Jun Jul Aug

│ │ │ │ │ │

▼ ▼ ▼ ▼ ▼ ▼

MVP Post-MVP ├── PLAN_VEO_VAK Phase 0.5 ──┤

Launch Stabilize │ (Observer, 6-8 weeks) │

│ │

├── Orch Pattern Phase 0-1 ──┤ │

│ (Foundation + Backend, │ │

│ 3 weeks) │ │

│ ├── Orch Phase 2+3 ──┤

│ │ (HMI + TestMgr, │

│ │ 2 weeks parallel) │

│ │ ├── Orch Phase 4 ─┤

│ │ │ (Integration) │

│

▲

│

CONFLICT ZONE: Both modify manager_base.py

Orch VC-2 (context tracking) + PLAN 1.0.8 (budget middleware)

These are ADDITIVE (different concerns) but need coordination

What This Means Practically

1. PRD_Orchestration_Pattern does NOT replace any VAK doc — it's a peer that adds PAS execution

capabilities that VAK will eventually consume

2. The Orchestration Pattern makes VAK more powerful — when VAK Phase 1.0 (Advisor) reaches

budget-gated tool execution, it will use the same manager_base.py that now has context tracking and

hand-off. VAK's Kernel Loop can eventually schedule phase-based work through the Orchestration

Pattern's Phase Dependency Engine.

3. One file conflict to manage: manager_base.py gets modified by both PLAN task 1.0.8 (budget

middleware) and Orchestration VC-2 (context lifecycle). These are additive — different concerns in the

same file — but need merge coordination.

4. Receipts flow one direction: Orchestration Pattern writes MerkleReceipt with

action_type="manager_handoff" → VAK audit service stores them → VAK UI reads them in the Audit panel.

Clean producer/consumer split.

VAK: The Verdict Autonomy Kernel

Related Research

WHITE PAPER: THE NGPV PROTOCOL

Claude Code vs Verdict Code: Comprehensive Comparison

VAK: Deep-Researched Validation and Design Hardening for the Verdict Autonomy Kernel

PRD: Power Law Engine (PLE)