VAK: Deep-Researched Validation and Design Hardening for the Verdict Autonomy Kernel
2/19/2026
Executive Summary
The market window described in PRD_VAK_v1.1 is real: viral demand for agentic autonomy has been empirically demonstrated, and so has the rapid trust collapse that follows when autonomy is shipped without professional-grade governance, isolation, and audit. In the past three weeks, the OpenClaw ecosystem’s growth (reported at 100k+ GitHub stars and ~2M visitors in a week) and its concurrent incident cascade (malicious “skills,” exposed gateways, and credential theft) created a concrete “adoption → incident → ban” loop that VAK is explicitly designed to break.
Several enterprise reactions—including internal restrictions and outright bans by large firms such as Meta—are consistent with the “binary autonomy is unsustainable” thesis in your PRD. The key research-backed differentiation for VAK is that it treats trust as the primitive: autonomy becomes a _metered, evidence-gated scalar_ rather than an on/off permission toggle. This direction is strongly aligned with decades of human-automation research showing that trust and reliance are calibrated through competence, predictability, and transparency, and that failures emerge as misuse/disuse/abuse when that calibration breaks.
The strongest validation comes from independent, high-credibility security analysis: MITRE ATLAS characterizes agentic ecosystems as introducing exploit chains where attackers can convert “features” (skills, configuration, tool invocation, memory) into end-to-end compromise paths in seconds. Your VAK primitives—Signal Bus sanitization, Capability Contracts, sandbox-by-default execution, sealed credentials, receipts with tamper-evidence, and an Autonomy Ladder—map directly onto those documented attack graphs and mitigation recommendations.
Two hardening opportunities stand out from the research:
Trust-labeled memory and dataflow (“taint tracking”) must be first-class. MITRE explicitly calls out “undifferentiated memory by source” as a key vulnerability class in agentic systems. VAK’s Signal Bus sanitization is a start, but the Receipt Store and Flight Recorder should also preserve _trust provenance for every memory write and every tool input_, with policy blocking “untrusted → high-privilege tool” edges by construction.
Professional trust requires “auditability you can operate,” not just “immutability you can claim.” Certificate Transparency–style append-only Merkle logs and Sigstore-style transparency logs provide a proven design pattern: keep sensitive payloads off-chain; anchor signed Merkle roots; support inclusion/consistency proofs. This supports VAK’s optional blockchain anchoring while avoiding “put everything on-chain” pitfalls.
This report validates the PRD’s core assertions with primary and authoritative sources (OpenClaw official docs/blog, MITRE ATLAS, NVD, Reuters, incident disclosure research), and then refines the VAK design into an implementable, Verdict-native architecture with concrete data models, hooks, diagrams, tradeoffs, and a phased MVP plan.
OpenClaw as baseline and the copycat wave
OpenClaw’s official README describes a Gateway “control plane” built around a single WebSocket endpoint, with a Control UI and WebChat served directly from the gateway; it supports many messaging surfaces and remote exposure via tooling such as Tailscale Serve/Funnel or SSH tunnels. Its “power” is not a mystery: it is engineered to connect agent reasoning to actionable tools (browser control, device nodes, system actions, etc.), which is the same capability class VAK aims to professionalize.
OpenClaw’s own security documentation frames its risk posture like a runbook for “footguns”: warnings include unauthenticated bindings, reverse-proxy loopback bypass conditions, insecure control UI auth modes, and disabling device-auth checks; the docs also point users to security audits and redaction controls—evidence that the ecosystem is fighting real-world misconfiguration patterns at scale. The key point for VAK is not that OpenClaw “ignored security,” but that binary autonomy plus fast-growing extensibility expands attack surface faster than reactive hardening can realistically compress.
The copycat wave reinforces this: major alternatives are optimizing for auditability-first (smaller codebases) and isolation-first (containerization) as a trust strategy, not primarily for new features. For example, NanoClaw positions itself as a lightweight alternative that runs in containers for security while retaining core personal-assistant traits (messaging integration, scheduled jobs). Meanwhile nanobot emphasizes ~4,000 lines of core code to create a readable, research-friendly agent skeleton—an “auditability-first” response.
A third theme is “make autonomy visible,” exemplified by Crabwalk, which provides a real-time live graph monitor of agent sessions, tool calls, and response chains via WebSocket integration—validating your Flight Recorder thesis that observability is not a bolt-on but a delight/trust driver.
Copycat survey with key differences
Project Core posture What it keeps (user-loved traits) What it changes (trust strategy) Primary source
NanoClaw Isolation-first Messaging presence, memory, schedules Container-by-default execution boundary
nanobot Auditability-first Core agent loop with simple deployment Shrinks codebase to make review/audit plausible
Crabwalk Observability-first Works with messaging-based agent workflows Live-node graph + tool-call tracing as a trust UI
Cloudflare / Moltworker Managed sandbox ops Retains OpenClaw workflows Moves runtime into a managed environment with admin UI and Access controls
Takeaway for VAK: there is no single “winning axis” (smaller code vs. stronger sandbox vs. better UX). The research suggests the sustainable solution is to unify the axes into a trust-native system—exactly what the Initiative Budget + Autonomy Ladder system is attempting.
Incident-driven threat landscape and why binary autonomy collapses trust
Your PRD’s incident timeline is strongly supported by external reporting and primary disclosures:
A large-scale malicious-skill campaign (“ClawHavoc”) was publicly documented as 341 malicious skills found in a marketplace audit by Koi Security, and widely republished by security outlets.
OpenClaw’s maintainers responded with a partnership with VirusTotal to add deterministic packaging, SHA-256 fingerprinting, lookups, and Code Insight scans.
Infostealers have been observed extracting OpenClaw configuration files containing tokens/keys, highlighting the “agent soul harvesting” risk of agent config/state directories.
Moltbook’s database exposure (misconfigured backend) was reported, including exposure of private agent messages and large volumes of credentials/tokens; the disclosure aligns with the “vibe coding” risk narrative.
Public exposure of OpenClaw control interfaces and large-scale “internet-facing agent” risk has been measured by scanning firms; Censys (Jan 31) documented 21k+ exposed deployments, and MITRE ATLAS references the unique danger of exposed control interfaces enabling credential access and execution.
The CVE record for a “one-click” compromise chain exists in the U.S. National Vulnerability Database: CVE-2026-25253 describes unvalidated gatewayUrl ingestion and automatic WebSocket connection behavior (patched in 2026.1.29).
A research-grade summary of these incidents and what “broke”:
Date (2026) Incident class What broke in system terms Evidence
Late Jan–early Feb Public exposure of control plane Control interfaces reachable; credentials in config become reachable; tool invocation becomes attacker-controlled via chat/tool APIs
Feb 1–3 Skill supply-chain compromise Unvetted extensions execute with broad privileges; social engineering causes users to run payload fetchers; “skills” become malware loaders
Feb 1 onward Browser/URL attack chains One-click RCE and cross-site WebSocket hijacking behavior chains UI/WS trust assumptions
Feb 3 onward Indirect prompt injection → C2 persistence Untrusted web content can poison agent behavior and induce tool invocation; persistence achieved by writing attacker-controlled instructions into agent context/state
Mid Feb Infostealer config harvesting Commodity malware targets agent directories for tokens/keys/context, enabling agent impersonation/lateral movement
Feb onward Organizational bans Risk posture triggers restrictions and bans; “fun autonomy” becomes unshippable inside enterprise environments
Attacker patterns that matter for VAK’s architecture
MITRE ATLAS’s analysis is especially valuable because it reframes agent security: the most dangerous exploits are not “low-level bugs alone,” but “high-level abuses of trust, configuration, and autonomy” that convert features into compromise paths quickly. This directly validates your PRD’s claim that “stronger cages” (harder sandboxes) are not sufficient without judgment and governance.
The recurring technique clusters in MITRE ATLAS include: direct/indirect prompt injection, tool invocation abuse, and modification of agent configuration. These correlate tightly with OWASP’s LLM application risk taxonomy, which explicitly lists Prompt Injection and Supply Chain Vulnerabilities among top risks, and separately highlights “Excessive Agency” as a broader failure pattern in deployed systems.
For VAK, this implies a non-negotiable design principle:
No untrusted input should ever directly cause high-privilege tool invocation without an intervening, enforceable boundary (policy + budget + sandbox). The PRD already proposes Signal Bus sanitization and Autonomy Ladder gating; the research indicates these should be extended into a pervasive “trust-labeled dataflow” model so that memory, candidates, receipts, and tool calls preserve provenance and enforce “taint barriers.”
Trust and human factors research that supports the Initiative Budget
The Initiative Budget thesis is strongly aligned with well-established human factors research:
Lee & See argue that trust guides reliance when automation is complex, and that design should aim for appropriate reliance rather than maximal trust.
Parasuraman & Riley’s taxonomy of use, misuse, disuse, and abuse explains why binary autonomy produces catastrophic swings: overtrust can lead users to grant broad authority; undertrust leads to bans and abandonment; and misdesign produces systemic harm.
Endsley & Kaber explicitly note that automation has often been treated as a binary allocation between human and machine; they studied levels of automation and how these affect performance and situation awareness in dynamic control tasks.
Hoff & Bashir provide a three-layer trust model emphasizing variability of trust (dispositional, situational, learned), reinforcing the need for time-varying trust mechanisms (decay, crashes, task-category specificity).
This body of work doesn’t just support the Autonomy Ladder concept; it suggests specific implementation constraints:
Intermediate autonomy levels are not optional: they are a safety valve against “out-of-the-loop” problems, where humans lose the ability to intervene effectively because they are reduced to monitors of opaque automation.
Trust calibration requires strong feedback loops: users need legibility (why did it do this?), predictability (what will it do next?), and reversibility (can I undo it?). Your Flight Recorder and undo-chain receipts align with these requirements.
Trust must be governed within a risk framework, not only “felt.” NIST frames trustworthy AI characteristics such as accountability, transparency, explainability, privacy, safety, and reliability—attributes that VAK is operationalizing through receipts, auditable budgets, and boundary enforcement.
A crucial nuance from these sources: trust is not “earned once.” It is learned, situational, and decays when conditions change or when automation behaves unexpectedly. Your PRD’s budget decay, failure crash, and circuit breakers are therefore not just product heuristics; they are consistent with how humans recalibrate reliance on imperfect automation.
Verdict-native VAK architecture: research-backed refinements and concrete implementation
This section treats PRD_VAK_v1.1 as the pre-architecture spec and then hardens it using the research above—especially MITRE ATLAS’s findings about memory taint, configuration abuse, and tool invocation chains.
The VAK control plane as a “governed autonomy runtime”
The most robust framing is:
Autonomy Kernel is the _only_ entity authorized to escalate from “reasoning” to “acting.”
All tools exist behind Capability Contracts (declarative privileges + attestations + limits).
All execution is routed through an Autonomy Ladder (Suggest → Draft → Sandbox Execute → Host Execute).
The Initiative Budget is a per-category, decayable trust ledger that gates rung eligibility.
Every action produces a Receipt, recorded in a tamper-evident log; optional external anchoring provides non-repudiation without exposing sensitive data.
This is consistent with MITRE ATLAS mitigation themes: restrict tool invocation on untrusted data, privilege segmentation, human-in-the-loop for high-impact actions, and telemetry logging.
Minimal architecture diagram (mermaid)
Signal Sources Git/PR/Repo events CI/CD + test results Runtime/Prod telemetry Support + community inputs Billing/spend telemetry Data model sketches (hardened against real attack patterns)
The PRD’s ActionCandidate, CapabilityContract, and Receipt models are directionally strong. The research suggests two refinements:
• Add _first-class provenance and trust labels_ on any field that can be influenced by untrusted inputs (signals, memory, retrieved context), because MITRE flags undifferentiated memory as a core vulnerability.
• Add explicit “tool-call taint barriers” so that any candidate derived from untrusted sources cannot reach high-privilege tools without a policy- and budget-verified promotion step, consistent with OWASP’s Prompt Injection risk.
ActionCandidate
Field
Type
Research-driven note
action_id
UUID
Stable identity for receipts and replay
trigger_signals
list
Must include trust tags and source provenance
summary
string
Human-legible “why” phrasing improves calibrated reliance
plan_steps
list
Supports intermediate autonomy level design
required_capabilities
list
Must be contract refs; no “ambient tool access”
risk_profile
struct
Include data sensitivity + blast radius + reversibility
taint_state
enum/struct
Derived from signal trust; blocks unsafe edges
rollback_strategy
struct
Reversibility improves reliance and reduces “ban” impulse
budget_category
enum
Enables per-category trust (learned trust model)
CapabilityContract
Field
Type
Research-driven note
publisher_identity
struct
Contract provenance fights supply-chain compromise
signatures/attestations
list
Align to supply-chain provenance norms (SLSA / attestations)
filesystem/network/process scopes
struct
Least privilege directly mitigates tool abuse
data_classes
set
Prevents accidental sensitive data handling
side_effects
set
Enables budget/risk computation and audit reviews
limits
struct
Mitigates Model DoS / runaway costs (OWASP)
minimum_rung
int
Enforces intermediate autonomy progression
sandbox_required
bool
Isolation-first control reduces blast radius
Receipt
Receipts should be implemented as an append-only Merkle log, similar to transparency log patterns: a signed root commits to all entries and enables inclusion/consistency proofs. The PRD’s chain-integrity approach (“previous_receipt_hash”) is compatible with this design.
Field
Type
Why it matters
budget_snapshot
struct
Enables “why could it act?” replay and audit
taint_snapshot
struct
Proves whether untrusted inputs influenced execution
tool_calls
list
Must record contract ref per call (non-repudiation)
sealed_credential_refs
list
Supports credential-hygiene against infostealer targeting
sandbox_provenance
struct
Critical for containment and forensics
merkle_leaf, signature, root_epoch
bytes/metadata
Enables tamper evidence and external anchoring patterns
Optional anchoring: proven patterns, not “crypto for vibes”
The PRD’s “tamper-evident anchoring” language matches established transparency approaches:
• Certificate Transparency (RFC 6962) uses a Merkle tree with signed tree heads and supports inclusion/consistency proofs.
• Sigstore’s Rekor describes an append-only transparency log whose validity can be cryptographically verified, with periodically signed Merkle roots.
Those patterns support VAK’s claim: audit integrity can be achieved without putting sensitive payloads on-chain, and anchoring becomes a periodic “root notarization” step.
External Anchoring (optional)
Flight Recorder
Receipt Store (Merkle Log)
Autonomy Kernel
External Anchoring (optional)
Flight Recorder
Receipt Store (Merkle Log)
Autonomy Kernel
alt
[anchoring enabled]
append(receipt_hash, receipt_signature)
update_merkle_tree()
publish(SignedTreeHead + inclusion proofs)
publish(SignedTreeHead root)
anchor_ref (tx/log entry)
Show code
Where VAK hooks into Verdict
The following is a concrete, Verdict-native integration map (as requested). It is written as a design intent, not a claim about current implementation.
VAK module
PAS hook
Gateway hook
Web Publisher hook
HMI hook
Signal Bus
Subscribe to run lifecycle events; CI/test telemetry ingestion
Subscribe to chat/approval events
Subscribe to build/publish/diff events
Subscribe to user feedback, overrides
Initiative Compiler
Task decomposition + run-pack synthesis
Escalation packaging for Concierge Channels
Candidate generation for publishing workflows
Explainable “why this candidate exists” UI
Initiative Budget
Per-category ledgers tied to run outcomes
Human approvals/rejections adjust trust
Deployment outcomes feed budget success rate
Budget dashboards, override controls
Autonomy Ladder
Enforce rung gating on tool execution
Approval flows + replay protection
Promote staged deploys via rung gating
Progressive disclosure of actions
Capability Contracts
Bind tools to contracts; enforce at invocation
Policy injection for channel-sensitive actions
Publishing actions constrained by contract
Contract review and install UX
Receipt Store
Run receipts as first-class artifacts
Concierge approvals stored as receipt artifacts
Publish receipts after deploy/publish
Evidence trail and decision replay
Security and privacy tradeoffs
The research indicates several “professional autonomy” tradeoffs that should be made explicit in PRD→architecture work:
• Hash-only receipts improve privacy but can impair debugging. Consider a dual-layer model: redacted hashes for the append-only receipt log, and a separately access-controlled “forensic vault” that stores encrypted, role-gated artifacts for legitimate investigations. The infostealer targeting trend makes it risky to store raw tokens/args in ordinary logs.
• Sandboxing reduces blast radius but increases operational complexity and cold-start latency. Moltworker’s docs note 1–2 minute cold starts for containerized environments, reinforcing that “always-on” can be expensive unless lifecycle/hibernation and caching are engineered.
• Signal sanitization cannot be perfect; therefore “taint barriers + rung gating” must be the fail-safe. MITRE highlights prompt injection and config manipulation as recurring techniques; OWASP standardizes prompt injection as a top risk.
Implementation complexity (validated by ecosystem lessons)
OpenClaw’s “security runbook” and the copycat trend suggest users reward safety primitives, but only if they are operable and don’t destroy UX.
Component
Complexity
Why (in practice)
Major failure mode
Receipt Store (Merkle + signing)
Medium
Straightforward crypto + storage patterns exist
Key management and rotation
Flight Recorder UI
Medium–High
Requires “answerability” UX, not raw logs
Overwhelming noise → distrust
Capability Contracts + Governance
High
Supply-chain risk is real and immediate
Bottlenecks and dev friction
Initiative Budget
High
Trust calibration is hard; must resist gaming
Autonomy inflation or over-conservatism
Signal Sanitization
Medium
Must treat external content as hostile
False negatives cause unsafe tool use
Sandbox Executor
High
Isolation + mounts + egress logs are nontrivial
Escapes/misconfig; developer pain
MVP roadmap: trust-first ordering and success metrics
The “trust-first ordering” in the PRD is supported by both empirical incidents (incidents precede bans) and the trust/reliance research (opacity drives misuse/disuse).
Phased roadmap table
Phase
Deliverables
Max rung enabled
Success metrics (examples)
Trust foundation
Receipts + Flight Recorder v1 + redaction + sealed-credential references
0–1
≥90% actions traceable end-to-end; 0 critical secret leaks in receipts (audited)
Bounded initiative
Signal Bus (top sources) + Candidate generation + Suggest/Draft flows
2
≥30% reduction in time-to-first-draft; ≥60% “useful” rating on briefings
Safe execution
Sandbox executor + Capability Contracts (curated) + contract enforcement
3
≥95% sandbox runs reproducible; ≥99% rollback success in sandbox
Earned host access
Budget v2 (decay/crash/circuit breakers) + scoped host executor + promotion UX
4
0 high-severity incidents from rung-4 actions; mean time-to-approve decreases without incident rise
Full lifecycle formations
Formation manager + Operations formation (briefings, residue, triage)
4
30-day continuous run without P0 autonomy failure; triage accuracy ≥85%
Ecosystem + enterprise
Signed skill registry + staged rollout + revocation + optional anchoring
4
Mean time to revoke malicious skill <30 min; external audit verifies receipt integrity
Timeline diagram (mermaid)
Mar 01
Apr 01
May 01
Jun 01
Jul 01
Aug 01
Sep 01
Oct 01
Nov 01
Dec 01
Jan 01
Feb 01
Mar 01
Receipts + redaction + sealed refs
Flight Recorder v1
Signal Bus + candidate gen
Suggest/Draft UX
Sandbox executor MVP
Capability contracts (curated)
Initiative Budget v2 + breakers
Scoped host executor + promotions
Formation manager + Ops formation
Signed registry + staged rollout
Optional anchoring
Trust infrastructure
Bounded initiative
Safe execution
Earned host autonomy
Lifecycle formations
Ecosystem + enterprise
VAK rollout (trust-first)
Show code
Comparison map: OpenClaw traits → VAK implementations
This table directly addresses your PRD’s positioning: not copying OpenClaw’s UI or skill format, but capturing the “thing people love” (initiative, ecosystem, always-on teammate feel) while fixing the trust collapse dynamics.
OpenClaw trait users love
Why it delights
VAK implementation
Advantage
Main risk
Chat-surface presence across tools
Zero-friction teammate feel
Concierge Channels + escalation packages
Preserves delight without making chat the control plane
Channel auth/identity complexity
Always-on initiative
Background momentum
Signal Bus + morning briefings + task residue
“Virtual org” feel becomes daily value
Notification fatigue
Skills ecosystem
Personalization/network effects
Capability Contracts + signed registry + staged rollout
Structural defense against “malicious skills” class
Governance friction
“Agent does real work”
Competence cue
Autonomy Ladder + sandbox-by-default
Prevents overtrust and blast-radius mistakes
Sandbox UX can feel slow
Fun observability (Crabwalk effect)
Trust via visibility
Flight Recorder + evidence trail + decision replay
Legibility supports appropriate reliance
Log overload without curation
Rapid scale adoption
Community momentum
Trust-first rollout + earned autonomy
Avoids “adoption → incident → ban” loop
May feel conservative early
Research-anchored success metrics for “metered, explainable, reversible” autonomy
To ensure VAK is not just a concept but measurable:
• Reliance calibration metrics: approval rate by category; override frequency; “never again” rule rate; rung promotions vs demotions; and time-to-intervention during incidents. These align with trust/reliance dynamics discussed in human factors literature.
• Security outcome metrics: contract violation rate; untrusted→privileged tool-call attempts blocked; credential exposure incidents; time-to-revoke a skill; and exposed-control-plane detection/response times (directly tied to observed OpenClaw failure modes).
• Operational value metrics: time-to-first-draft, time-to-merge, regression detection lead time, rollback success, and “overnight actions resolved” (mirroring morning briefing value). These are the “professional autonomy” KPIs that convert trust into adoption without triggering bans.
Key competitive reality: OpenClaw proved the appetite for “autonomy as a teammate,” while the incident timeline proved the inevitability of trust collapse without governance. VAK’s core claim—autonomy earned by evidence, bounded by contracts, executed in isolation, recorded with tamper evidence—is not only coherent, but directly mapped to the exploit chains and mitigations identified by MITRE ATLAS, OWASP LLM security guidance, OpenClaw’s own hardening posture, and transparency log best practices.
Field Type Research-driven note
action_id UUID Stable identity for receipts and replay
trigger_signals list Must include trust tags and source provenance
summary string Human-legible “why” phrasing improves calibrated reliance
plan_steps list Supports intermediate autonomy level design
required_capabilities list Must be contract refs; no “ambient tool access”
risk_profile struct Include data sensitivity + blast radius + reversibility
taint_state enum/struct Derived from signal trust; blocks unsafe edges
rollback_strategy struct Reversibility improves reliance and reduces “ban” impulse
budget_category enum Enables per-category trust (learned trust model)
Field Type Research-driven note
publisher_identity struct Contract provenance fights supply-chain compromise
signatures/attestations list Align to supply-chain provenance norms (SLSA / attestations)
filesystem/network/process scopes struct Least privilege directly mitigates tool abuse
data_classes set Prevents accidental sensitive data handling
side_effects set Enables budget/risk computation and audit reviews
limits struct Mitigates Model DoS / runaway costs (OWASP)
minimum_rung int Enforces intermediate autonomy progression
sandbox_required bool Isolation-first control reduces blast radius
Field Type Why it matters
budget_snapshot struct Enables “why could it act?” replay and audit
taint_snapshot struct Proves whether untrusted inputs influenced execution
tool_calls list Must record contract ref per call (non-repudiation)
sealed_credential_refs list Supports credential-hygiene against infostealer targeting
sandbox_provenance struct Critical for containment and forensics
merkle_leaf, signature, root_epoch bytes/metadata Enables tamper evidence and external anchoring patterns
VAK module PAS hook Gateway hook Web Publisher hook HMI hook
Signal Bus Subscribe to run lifecycle events; CI/test telemetry ingestion Subscribe to chat/approval events Subscribe to build/publish/diff events Subscribe to user feedback, overrides
Initiative Compiler Task decomposition + run-pack synthesis Escalation packaging for Concierge Channels Candidate generation for publishing workflows Explainable “why this candidate exists” UI
Initiative Budget Per-category ledgers tied to run outcomes Human approvals/rejections adjust trust Deployment outcomes feed budget success rate Budget dashboards, override controls
Autonomy Ladder Enforce rung gating on tool execution Approval flows + replay protection Promote staged deploys via rung gating Progressive disclosure of actions
Capability Contracts Bind tools to contracts; enforce at invocation Policy injection for channel-sensitive actions Publishing actions constrained by contract Contract review and install UX
Receipt Store Run receipts as first-class artifacts Concierge approvals stored as receipt artifacts Publish receipts after deploy/publish Evidence trail and decision replay
Component Complexity Why (in practice) Major failure mode
Receipt Store (Merkle + signing) Medium Straightforward crypto + storage patterns exist Key management and rotation
Flight Recorder UI Medium–High Requires “answerability” UX, not raw logs Overwhelming noise → distrust
Capability Contracts + Governance High Supply-chain risk is real and immediate Bottlenecks and dev friction
Initiative Budget High Trust calibration is hard; must resist gaming Autonomy inflation or over-conservatism
Signal Sanitization Medium Must treat external content as hostile False negatives cause unsafe tool use
Sandbox Executor High Isolation + mounts + egress logs are nontrivial Escapes/misconfig; developer pain
Phase Deliverables Max rung enabled Success metrics (examples)
Trust foundation Receipts + Flight Recorder v1 + redaction + sealed-credential references 0–1 ≥90% actions traceable end-to-end; 0 critical secret leaks in receipts (audited)
Bounded initiative Signal Bus (top sources) + Candidate generation + Suggest/Draft flows 2 ≥30% reduction in time-to-first-draft; ≥60% “useful” rating on briefings
Safe execution Sandbox executor + Capability Contracts (curated) + contract enforcement 3 ≥95% sandbox runs reproducible; ≥99% rollback success in sandbox
Earned host access Budget v2 (decay/crash/circuit breakers) + scoped host executor + promotion UX 4 0 high-severity incidents from rung-4 actions; mean time-to-approve decreases without incident rise
Full lifecycle formations Formation manager + Operations formation (briefings, residue, triage) 4 30-day continuous run without P0 autonomy failure; triage accuracy ≥85%
Ecosystem + enterprise Signed skill registry + staged rollout + revocation + optional anchoring 4 Mean time to revoke malicious skill <30 min; external audit verifies receipt integrity
OpenClaw trait users love Why it delights VAK implementation Advantage Main risk
Chat-surface presence across tools Zero-friction teammate feel Concierge Channels + escalation packages Preserves delight without making chat the control plane Channel auth/identity complexity
Always-on initiative Background momentum Signal Bus + morning briefings + task residue “Virtual org” feel becomes daily value Notification fatigue
Skills ecosystem Personalization/network effects Capability Contracts + signed registry + staged rollout Structural defense against “malicious skills” class Governance friction
“Agent does real work” Competence cue Autonomy Ladder + sandbox-by-default Prevents overtrust and blast-radius mistakes Sandbox UX can feel slow
Fun observability (Crabwalk effect) Trust via visibility Flight Recorder + evidence trail + decision replay Legibility supports appropriate reliance Log overload without curation
Rapid scale adoption Community momentum Trust-first rollout + earned autonomy Avoids “adoption → incident → ban” loop May feel conservative early
The PRD’s ActionCandidate, CapabilityContract, and Receipt models are directionally strong. The research suggests two refinements:
• Add _first-class provenance and trust labels_ on any field that can be influenced by untrusted inputs (signals, memory, retrieved context), because MITRE flags undifferentiated memory as a core vulnerability.
• Add explicit “tool-call taint barriers” so that any candidate derived from untrusted sources cannot reach high-privilege tools without a policy- and budget-verified promotion step, consistent with OWASP’s Prompt Injection risk.
ActionCandidate
Field
Type
Research-driven note
action_id
UUID
Stable identity for receipts and replay
trigger_signals
list
Must include trust tags and source provenance
summary
string
Human-legible “why” phrasing improves calibrated reliance
plan_steps
list
Supports intermediate autonomy level design
required_capabilities
list
Must be contract refs; no “ambient tool access”
risk_profile
struct
Include data sensitivity + blast radius + reversibility
taint_state
enum/struct
Derived from signal trust; blocks unsafe edges
rollback_strategy
struct
Reversibility improves reliance and reduces “ban” impulse
budget_category
enum
Enables per-category trust (learned trust model)
CapabilityContract
Field
Type
Research-driven note
publisher_identity
struct
Contract provenance fights supply-chain compromise
signatures/attestations
list
Align to supply-chain provenance norms (SLSA / attestations)
filesystem/network/process scopes
struct
Least privilege directly mitigates tool abuse
data_classes
set
Prevents accidental sensitive data handling
side_effects
set
Enables budget/risk computation and audit reviews
limits
struct
Mitigates Model DoS / runaway costs (OWASP)
minimum_rung
int
Enforces intermediate autonomy progression
sandbox_required
bool
Isolation-first control reduces blast radius
Receipt
Receipts should be implemented as an append-only Merkle log, similar to transparency log patterns: a signed root commits to all entries and enables inclusion/consistency proofs. The PRD’s chain-integrity approach (“previous_receipt_hash”) is compatible with this design.
Field
Type
Why it matters
budget_snapshot
struct
Enables “why could it act?” replay and audit
taint_snapshot
struct
Proves whether untrusted inputs influenced execution
tool_calls
list
Must record contract ref per call (non-repudiation)
sealed_credential_refs
list
Supports credential-hygiene against infostealer targeting
sandbox_provenance
struct
Critical for containment and forensics
merkle_leaf, signature, root_epoch
bytes/metadata
Enables tamper evidence and external anchoring patterns
Optional anchoring: proven patterns, not “crypto for vibes”
The PRD’s “tamper-evident anchoring” language matches established transparency approaches:
• Certificate Transparency (RFC 6962) uses a Merkle tree with signed tree heads and supports inclusion/consistency proofs.
• Sigstore’s Rekor describes an append-only transparency log whose validity can be cryptographically verified, with periodically signed Merkle roots.
Those patterns support VAK’s claim: audit integrity can be achieved without putting sensitive payloads on-chain, and anchoring becomes a periodic “root notarization” step.
External Anchoring (optional)
Flight Recorder
Receipt Store (Merkle Log)
Autonomy Kernel
External Anchoring (optional)
Flight Recorder
Receipt Store (Merkle Log)
Autonomy Kernel
alt
[anchoring enabled]
append(receipt_hash, receipt_signature)
update_merkle_tree()
publish(SignedTreeHead + inclusion proofs)
publish(SignedTreeHead root)
anchor_ref (tx/log entry)
Show code
Where VAK hooks into Verdict
The following is a concrete, Verdict-native integration map (as requested). It is written as a design intent, not a claim about current implementation.
VAK module
PAS hook
Gateway hook
Web Publisher hook
HMI hook
Signal Bus
Subscribe to run lifecycle events; CI/test telemetry ingestion
Subscribe to chat/approval events
Subscribe to build/publish/diff events
Subscribe to user feedback, overrides
Initiative Compiler
Task decomposition + run-pack synthesis
Escalation packaging for Concierge Channels
Candidate generation for publishing workflows
Explainable “why this candidate exists” UI
Initiative Budget
Per-category ledgers tied to run outcomes
Human approvals/rejections adjust trust
Deployment outcomes feed budget success rate
Budget dashboards, override controls
Autonomy Ladder
Enforce rung gating on tool execution
Approval flows + replay protection
Promote staged deploys via rung gating
Progressive disclosure of actions
Capability Contracts
Bind tools to contracts; enforce at invocation
Policy injection for channel-sensitive actions
Publishing actions constrained by contract
Contract review and install UX
Receipt Store
Run receipts as first-class artifacts
Concierge approvals stored as receipt artifacts
Publish receipts after deploy/publish
Evidence trail and decision replay
Security and privacy tradeoffs
The research indicates several “professional autonomy” tradeoffs that should be made explicit in PRD→architecture work:
• Hash-only receipts improve privacy but can impair debugging. Consider a dual-layer model: redacted hashes for the append-only receipt log, and a separately access-controlled “forensic vault” that stores encrypted, role-gated artifacts for legitimate investigations. The infostealer targeting trend makes it risky to store raw tokens/args in ordinary logs.
• Sandboxing reduces blast radius but increases operational complexity and cold-start latency. Moltworker’s docs note 1–2 minute cold starts for containerized environments, reinforcing that “always-on” can be expensive unless lifecycle/hibernation and caching are engineered.
• Signal sanitization cannot be perfect; therefore “taint barriers + rung gating” must be the fail-safe. MITRE highlights prompt injection and config manipulation as recurring techniques; OWASP standardizes prompt injection as a top risk.
Implementation complexity (validated by ecosystem lessons)
OpenClaw’s “security runbook” and the copycat trend suggest users reward safety primitives, but only if they are operable and don’t destroy UX.
Component
Complexity
Why (in practice)
Major failure mode
Receipt Store (Merkle + signing)
Medium
Straightforward crypto + storage patterns exist
Key management and rotation
Flight Recorder UI
Medium–High
Requires “answerability” UX, not raw logs
Overwhelming noise → distrust
Capability Contracts + Governance
High
Supply-chain risk is real and immediate
Bottlenecks and dev friction
Initiative Budget
High
Trust calibration is hard; must resist gaming
Autonomy inflation or over-conservatism
Signal Sanitization
Medium
Must treat external content as hostile
False negatives cause unsafe tool use
Sandbox Executor
High
Isolation + mounts + egress logs are nontrivial
Escapes/misconfig; developer pain
MVP roadmap: trust-first ordering and success metrics
The “trust-first ordering” in the PRD is supported by both empirical incidents (incidents precede bans) and the trust/reliance research (opacity drives misuse/disuse).
Phased roadmap table
Phase
Deliverables
Max rung enabled
Success metrics (examples)
Trust foundation
Receipts + Flight Recorder v1 + redaction + sealed-credential references
0–1
≥90% actions traceable end-to-end; 0 critical secret leaks in receipts (audited)
Bounded initiative
Signal Bus (top sources) + Candidate generation + Suggest/Draft flows
2
≥30% reduction in time-to-first-draft; ≥60% “useful” rating on briefings
Safe execution
Sandbox executor + Capability Contracts (curated) + contract enforcement
3
≥95% sandbox runs reproducible; ≥99% rollback success in sandbox
Earned host access
Budget v2 (decay/crash/circuit breakers) + scoped host executor + promotion UX
4
0 high-severity incidents from rung-4 actions; mean time-to-approve decreases without incident rise
Full lifecycle formations
Formation manager + Operations formation (briefings, residue, triage)
4
30-day continuous run without P0 autonomy failure; triage accuracy ≥85%
Ecosystem + enterprise
Signed skill registry + staged rollout + revocation + optional anchoring
4
Mean time to revoke malicious skill <30 min; external audit verifies receipt integrity
Timeline diagram (mermaid)
Mar 01
Apr 01
May 01
Jun 01
Jul 01
Aug 01
Sep 01
Oct 01
Nov 01
Dec 01
Jan 01
Feb 01
Mar 01
Receipts + redaction + sealed refs
Flight Recorder v1
Signal Bus + candidate gen
Suggest/Draft UX
Sandbox executor MVP
Capability contracts (curated)
Initiative Budget v2 + breakers
Scoped host executor + promotions
Formation manager + Ops formation
Signed registry + staged rollout
Optional anchoring
Trust infrastructure
Bounded initiative
Safe execution
Earned host autonomy
Lifecycle formations
Ecosystem + enterprise
VAK rollout (trust-first)
Show code
Comparison map: OpenClaw traits → VAK implementations
This table directly addresses your PRD’s positioning: not copying OpenClaw’s UI or skill format, but capturing the “thing people love” (initiative, ecosystem, always-on teammate feel) while fixing the trust collapse dynamics.
OpenClaw trait users love
Why it delights
VAK implementation
Advantage
Main risk
Chat-surface presence across tools
Zero-friction teammate feel
Concierge Channels + escalation packages
Preserves delight without making chat the control plane
Channel auth/identity complexity
Always-on initiative
Background momentum
Signal Bus + morning briefings + task residue
“Virtual org” feel becomes daily value
Notification fatigue
Skills ecosystem
Personalization/network effects
Capability Contracts + signed registry + staged rollout
Structural defense against “malicious skills” class
Governance friction
“Agent does real work”
Competence cue
Autonomy Ladder + sandbox-by-default
Prevents overtrust and blast-radius mistakes
Sandbox UX can feel slow
Fun observability (Crabwalk effect)
Trust via visibility
Flight Recorder + evidence trail + decision replay
Legibility supports appropriate reliance
Log overload without curation
Rapid scale adoption
Community momentum
Trust-first rollout + earned autonomy
Avoids “adoption → incident → ban” loop
May feel conservative early
Research-anchored success metrics for “metered, explainable, reversible” autonomy
To ensure VAK is not just a concept but measurable:
• Reliance calibration metrics: approval rate by category; override frequency; “never again” rule rate; rung promotions vs demotions; and time-to-intervention during incidents. These align with trust/reliance dynamics discussed in human factors literature.
• Security outcome metrics: contract violation rate; untrusted→privileged tool-call attempts blocked; credential exposure incidents; time-to-revoke a skill; and exposed-control-plane detection/response times (directly tied to observed OpenClaw failure modes).
• Operational value metrics: time-to-first-draft, time-to-merge, regression detection lead time, rollback success, and “overnight actions resolved” (mirroring morning briefing value). These are the “professional autonomy” KPIs that convert trust into adoption without triggering bans.
Key competitive reality: OpenClaw proved the appetite for “autonomy as a teammate,” while the incident timeline proved the inevitability of trust collapse without governance. VAK’s core claim—autonomy earned by evidence, bounded by contracts, executed in isolation, recorded with tamper evidence—is not only coherent, but directly mapped to the exploit chains and mitigations identified by MITRE ATLAS, OWASP LLM security guidance, OpenClaw’s own hardening posture, and transparency log best practices.