TC
← All Research
Claude Code vs Verdict Code: Comprehensive Comparison
ExperimentVerdict/VAK

Claude Code vs Verdict Code: Comprehensive Comparison

1/14/2026 docs/research/COMPARISON_Claude_Code_Vs_Verdict_Code.md

2026-01-1423 min read4,464 words
Trent Carter + Claude

1/14/2026

docs/research/COMPARISON_Claude_Code_Vs_Verdict_Code.md

Claude Code vs Verdict Code: Comprehensive Comparison

Document Purpose: Technical comparison of Claude Code and Verdict Code as coding agent frameworks Last Updated: 2026-01-14 Version: 1.0 Framework Version: CABF (Coding Agent Benchmark Framework) v1.0

Executive Summary

Claude Code and Verdict Code represent two distinct approaches to AI-powered coding assistance. Claude Code, developed by Anthropic, is a commercial CLI tool optimized for Claude models with a focus on developer experience and ease of use. Verdict Code is an open-source framework designed for flexibility, multi-model support, and integration with broader infrastructure including Gateway services, memory systems, and microservices architecture.

The fundamental architectural difference lies in their deployment models: Claude Code operates as a standalone CLI with direct API integration to Anthropic's services, while Verdict Code uses a Gateway-based architecture that abstracts model access, enabling multi-provider support, dynamic model routing, and unified cost tracking. This makes Verdict Code more suitable for enterprise deployments requiring cost management, multi-model strategies, and integration with existing development infrastructure.

In terms of capabilities, both frameworks support tool use, file operations, and multi-step reasoning through ReAct loops. However, Verdict Code extends beyond Claude Code with features like agentic memory with graceful degradation, context compaction for long-running sessions, multi-agent coordination via sub-agents, MCP (Model Context Protocol) integration, and a comprehensive skill management system. Claude Code excels in polished user experience, optimized Claude integration, and simpler setup for teams already using Anthropic's models.

For developers choosing between these frameworks, the decision hinges on specific needs: choose Claude Code for streamlined Claude-focused development with minimal infrastructure overhead; choose Verdict Code for multi-model flexibility, enterprise cost management, advanced memory and coordination features, or when building custom AI development platforms. The CABF framework provides objective performance metrics through standardized benchmarks, enabling data-driven decision making based on actual task performance rather than feature lists.


Detailed Comparison Table

Core Architecture

Feature/CategoryClaude CodeVerdict CodeNotes/Context Deployment ModelStandalone CLI toolGateway-based architectureClaude Code runs independently; Verdict Code requires Gateway (port 6120) for model access Primary InterfaceCommand-line interface (CLI)Python library + CLIVerdict Code can be imported as Python package; Claude Code is CLI-only Integration MethodSubprocess execution (via adapter)Direct Python integrationVerdict Code adapter uses direct Agent class calls (621 lines); Claude Code uses subprocess (353 lines) Codebase SizeProprietary (not visible)~15,000+ lines across modulesVerdict Code is fully open-source and extensible Architecture TypeMonolithic CLIMicroservices-orientedVerdict Code integrates with Gateway, Telemetry, Memory, and Skills services Configuration SourceCLI arguments + config filesAgentConfig dataclass + environmentVerdict Code uses Python dataclasses for type-safe configuration Session ManagementBuilt into CLISessionManager componentVerdict Code has explicit session models with persistence State ManagementInternal to CLIAgentState enum (IDLE, THINKING, TOOL_USE, ERROR, COMPLETE)Verdict Code exposes explicit state transitions ExtensibilityLimited to provided featuresHighly extensible via custom commands, hooks, skillsVerdict Code supports user/project-level custom commands DependenciesPython runtime + Anthropic SDKPython + Gateway + optional servicesVerdict Code can function with degraded services

Agent Capabilities

Feature/CategoryClaude CodeVerdict CodeNotes/Context Tool Use SupportNative Claude tool useConditional (requires capable model)Verdict Code requires models with tool support (e.g., claude-sonnet-4-5) Multi-file EditingSupported via Edit toolSupported via Edit toolBoth use string replacement with validation Background TasksUnknown (not exposed in adapter)Supported via Bash run_in_background parameterVerdict Code tracks background tasks with threading.Lock Custom CommandsNot availableYes - user-level (~/.verdict/commands/) and project-level (.verdict/commands/)Verdict Code has full command discovery and loader system Hooks SystemNot availableYes - pre/post execution hooksVerdict Code has HookRegistry and HookExecutor MCP IntegrationUnknownYes - MCP client and registryVerdict Code supports Model Context Protocol servers CheckpointingUnknownNot explicitly implementedVerdict Code has rewind/resume commands for state recovery Vision SupportModel-dependentModel-dependent (configurable)Both rely on underlying model capabilities Task Tool (Sub-agents)Not availableYes - spawns specialized sub-agents (EXPLORE, PLAN, BASH, GENERAL)Verdict Code SubAgent class with specialized prompts Skill RoutingNot availableYes - skill-aware routing for optimal model selectionIntegrated with agents registry for cost optimization Custom Agent TypesNot availableYes - SubagentConfig for custom agent definitionsSupports custom system prompts and tool access ReAct Loop ImplementationInternal to Claude CodeExplicit generator yielding TurnResult objectsVerdict Code exposes turn-by-turn execution via Agent.chat() Max Turns ConfigurationUnknownYes - DEFAULT_MAX_TURNS = 100, configurable via AgentConfigVerdict Code prevents infinite loops Timeout HandlingYesYes - DEFAULT_TIMEOUT_MS = 120,000ms (2 minutes), configurableBoth support task-level timeouts Streaming OutputYes (via CLI)Yes - via callback system (on_text, on_tool_start, on_tool_end)Verdict Code supports custom output formatters (text, JSON, stream-JSON)

Tool Support

Feature/CategoryClaude CodeVerdict CodeNotes/Context Bash ToolYesYes - with persistent shell sessionVerdict Code tracks background tasks and supports timeout/description Read ToolYesYes - with offset/limit parametersDefault reads 2000 lines, configurable Write ToolYesYes - creates or overwrites filesVerdict Code requires Read before Edit (validation) Edit ToolYesYes - exact string replacementVerdict Code tracks files_read set for validation Glob ToolYesYes - pattern-based file searchBoth support */.py style patterns Grep ToolYesYes - ripgrep-compatible searchVerdict Code supports output modes: content, files_with_matches, count TodoWrite ToolYesYes - task tracking with statusVerdict Code has active_form for display AskUserQuestion ToolYesYes - interactive user inputVerdict Code supports optional answers Task Tool (Sub-agents)NoYes - spawns specialized agentsUnique to Verdict Code Tool Schema FormatAnthropic-compatibleAnthropic-compatible (via ToolSchema.to_anthropic_schema())Both use standard Anthropic tool format Tool RegistryInternalToolRegistry class with get_schemas() and execute()Verdict Code has extensible tool registration Tool Execution TrackingYesYes - TelemetryCollector tracks tool callsVerdict Code records duration_ms, success, parameters Custom ToolsNot supportedCan extend ToolRegistryVerdict Code architecture allows custom tool additions Tool Timeout HandlingYesYes - per-tool timeout parameterDefault 2 minutes, max 10 minutes in Verdict Code Tool Error RecoveryYesYes - ToolResult includes error contentBoth continue execution after tool failures Background Tool ExecutionUnknownYes - Bash.run_in_background with TaskOutput retrievalVerdict Code uses threading.Lock for task tracking

Memory Management

Feature/CategoryClaude CodeVerdict CodeNotes/Context Context CompactionUnknown (likely automatic)Yes - CompactionEngine with auto_compact flagTriggers at 90% of max_context_tokens by default Token Counting MethodInternal (exact from API)Optional tiktoken or character ratio (1 token ~ 4 chars)Verdict Code falls back to char ratio if tiktoken unavailable Max Context TokensModel-dependentConfigurable (default 8192)Verdict Code supports models up to 200K tokens (Claude) Context StatisticsNot exposedYes - get_context_stats() returns usage, turn_count, token countsVerdict Code provides visibility into context usage Compaction StrategiesUnknownMultiple - OldestTurns, Summarization, ToolResultCompactionVerdict Code balances summary quality vs token savings Auto-compact TriggerUnknownYes - compact_threshold (default 90%)Verdict Code automatically compacts when threshold exceeded Compaction CallbackUnknownYes - on_auto_compact callback for UI updatesNotifies when compaction occurs Agentic MemoryUnknownYes - AgenticMemoryClient with graceful degradationOptional memory service (port 6250) for context persistence Memory Service TypesNot applicableSAM (service), LOCAL (file-based), DISABLEDVerdict Code falls back to stateless mode if unavailable Context StorageNot applicableYes - store_context() with session_id and tagsPersists conversation context across sessions Context RetrievalNot applicableYes - retrieve_context() returns cached contextEnables session resumption Pattern LearningNot availableYes - store_pattern() and retrieve_pattern()Agents can learn and reuse patterns Memory Graceful DegradationNot applicableYes - returns MemoryResult.degraded=True if unavailableLogs warnings but continues execution Session PersistenceUnknownYes - session/manager.py with SessionManagerVerdict Code supports session resume via /resume command Token Usage TrackingYes (via API)Partial - TelemetryCollector placeholder (TODO: extract from Gateway)Verdict Code adapter notes token counting needs Gateway integration

Multi-Agent Coordination

Feature/CategoryClaude CodeVerdict CodeNotes/Context Sub-agent SupportNoYes - SubAgent class extends AgentVerdict Code spawns specialized agents via Task tool Agent TypesN/AEXPLORE, PLAN, BASH, GENERALEach type has specialized system prompt and tool access Sub-agent ConfigurationN/ASubagentConfig - custom system prompts, models, toolsSupports custom agent definitions via agents registry Parent Context InheritanceN/AYes - SubAgent receives parent_context parameterSub-agents see parent conversation history Tool Access ControlN/AYes - _get_allowed_tools() restricts tools by agent typeEXPLORE: Read/Glob/Grep; BASH: Bash only; GENERAL: all tools Model SelectionSingle model per sessionPer-agent model selectionSub-agent can use different model than parent Skill-aware RoutingNot availableYes - integrated with skill routing for optimal model selectionPhase 6 feature for cost-optimized sub-agent execution Agent RegistryNot availableYes - SubagentRegistry with Thoroughness levelsManages agent configurations and capabilities Multi-agent OrchestrationNot availableManual via Task toolUser explicitly spawns sub-agents for specialized tasks Agent CommunicationNot applicableVia parent_context and tool resultsSub-agents communicate through context passing Agent Lifecycle ManagementN/AExplicit initialization and cleanupSubAgent inherits close() method from Agent Parallel Agent ExecutionNot availableNo - sequential execution onlySub-agents run one at a time within parent session Agent TelemetryUnknownYes - on_retry, on_tool_start, on_tool_end callbacksTracks execution at agent level Agent Error HandlingYesYes - try/except with state transition to ERRORBoth handle agent-level errors gracefully

Cost Tracking

Feature/CategoryClaude CodeVerdict CodeNotes/Context Credit Cost TrackingVia Anthropic APIYes - cost_credits field in AgentResultVerdict Code Gateway returns cost information Cost DisplayYes (via CLI)Yes - /cost command shows session costsVerdict Code has cost command for real-time tracking Multi-model Cost ManagementNo (Anthropic only)Yes - Gateway supports multiple providers with unified creditsVerdict Code abstracts cost across providers Credit Multiplier SystemNot applicableYes - model_catalog table defines credit_multiplierFree models (byok/, lan/, local/) have multiplier=0.0 Cost SSOTAnthropic billingCloud Gateway (port 6123)Local Gateway (6120) proxies only, no billing Real-time Cost UpdatesUnknownYes - via Gateway responsesTelemetry service (port 6122) may track costs Cost EstimationVia Anthropic pricingVia pricing SSOT (cloud/config/verdict_master_pricing.json)Verdict Code never hardcodes prices Budget LimitsVia Anthropic account limitsYes - max_cost_credits in BenchmarkTaskCabf enforces per-task cost limits Cost ReportingVia Anthropic dashboardVia Verdict reports and CLI commandsVerdict Code provides cost breakdown by model/tool Free Model SupportNoYes - byok/, lan/, local/ prefixesLocal models like Ollama incur zero credit cost

Error Handling

Feature/CategoryClaude CodeVerdict CodeNotes/Context Retry LogicBuilt into Claude SDKYes - RetryConfig with exponential backoffmax_retries=3, base_delay=1.0s, max_delay=60.0s Retryable ErrorsInternalGatewayConnectionError, GatewayTimeoutError, RateLimitErrorVerdict Code distinguishes retryable vs non-retryable Non-retryable ErrorsInternalGatewayResponseError (4xx client errors)Fails immediately without retry Rate Limit HandlingVia Anthropic SDKYes - RateLimitError with retry_after headerHonors Gateway's Retry-After header Graceful ShutdownUnknownYes - ShutdownRequested exception with is_shutdown_requested() checksChecks before each turn and during retry delays Error State TrackingYesYes - AgentState.ERROR with error messageBoth track error state explicitly Error Recovery CountingUnknownYes - recovery_count in AgentResult and TelemetryCollectorTracks successful recoveries from errors Error CallbacksUnknownYes - on_error callback for UI updatesNotifies listeners of errors Exception HierarchyProprietary7 specific exception typesAgentError, GatewayConnectionError, GatewayTimeoutError, GatewayResponseError, RateLimitError, ModelNotFoundError, ShutdownRequested Timeout HandlingYesYes - asyncio.wait_for() with task.timeout_secondsCabf enforces per-task timeout limits HTTP Error HandlingVia SDKExplicit status code handling (200, 404, 429, 4xx, 5xx)Verdict Code parses Gateway error responses JSON Parse ErrorsVia SDKYes - GatewayResponseError for invalid JSONReturns truncated response body for debugging Connection Error HandlingVia SDKYes - distinguishes ConnectError vs ConnectTimeout vs ReadTimeoutProvides specific error messages Memory Service DegradationN/AYes - graceful degradation with MemoryResult.degraded=TrueContinues execution if memory unavailable

Performance Metrics

Feature/CategoryClaude CodeVerdict CodeNotes/Context Token Usage TrackingYes (exact from API)Partial - placeholder with TODO for Gateway integrationVerdict Code adapter needs token extraction from Gateway Execution Time TrackingYesYes - start_time, end_time, execution_time in AgentResultBoth track wall-clock time Tool Call MetricsYesYes - tool_calls list with duration_ms for each callVerdict Code tracks per-tool timing via TelemetryCollector Success Rate TrackingVia CabfVia Cabf - success_rates: Dict[str, float] in ComparisonReportBoth support benchmark-level aggregation Average Execution TimeVia CabfVia Cabf - avg_execution_times: Dict[str, float]Aggregated across benchmark runs Token EfficiencyVia CabfVia Cabf - avg_tokens_per_task: Dict[str, float]Both track input/output tokens Statistical AnalysisVia CabfVia Cabf - statistically_significant, p_value, confidence_intervalBoth support hypothesis testing Performance ProfilingUnknownYes - verbose mode with DEBUG outputVerdict Code logs request/response snippets Health ChecksYes (via --version)Yes - health_check() verifies Gateway, workspace, modelVerdict Code checks HTTP connectivity to Gateway Metrics ExportVia Cabf reportsVia Cabf reports - JSON, Markdown, visualizationsBoth support multiple output formats Telemetry IntegrationUnknownYes - on_retry, on_tool_start, on_tool_end callbacksVerdict Code supports custom telemetry collectors

Developer Experience

Feature/CategoryClaude CodeVerdict CodeNotes/Context Setup ComplexityLow (pip install claude)Medium - requires Gateway stackVerdict Code needs Gateway, optional services (Memory, Skills) ConfigurationCLI args + config filePython dataclasses + environment variablesVerdict Code uses AgentConfig for type-safe configuration Documentation QualityOfficial Anthropic docsIn-repo docs (SPECs, PRDs, howtos)Verdict Code has extensive but scattered documentation CLI UsabilityPolished (Anthropic-designed)Functional but less polishedVerdict Code prioritizes flexibility over UX polish Output FormatsText (CLI)Text, JSON, stream-JSONVerdict Code supports machine-readable output formats Interactive FeaturesYes (native)Yes - AskUserQuestion toolBoth support interactive user input Session ResumptionUnknownYes - /resume command with session persistenceVerdict Code SessionManager loads saved sessions Command DiscoveryBuilt-in helpYes - /help command with command registryVerdict Code has custom command loader Custom CommandsNot supportedYes - user and project-level commandsVerdict Code discovers commands from ~/.verdict/commands/ and .verdict/commands/ IDE IntegrationVSCode extension (official)VSCode extension (in repo) + IDE protocolVerdict Code has ide/protocol.py and ide/bridge.py Debugging SupportVia CLI outputVerbose mode with DEBUG loggingVerdict Code prints request/response snippets Error MessagesUser-friendlyTechnical but detailedVerdict Code provides stack traces and Gateway error details Learning CurveShallowSteeper - requires understanding Gateway, servicesVerdict Code is more complex but more powerful Community SupportAnthropic communityOpen-source repo (GitHub)Verdict Code benefits from open-source contributions

Integration & Extensibility

Feature/CategoryClaude CodeVerdict CodeNotes/Context Python APINo (CLI only)Yes - can import Agent class directlyVerdict Code supports library usage, not just CLI Custom Tool DevelopmentNot supportedCan extend ToolRegistryVerdict Code architecture allows custom tools Hook SystemNot availableYes - pre/post execution hooksVerdict Code has HookRegistry and HookExecutor Custom CommandsNot availableYes - Python-based commands with argparseSupports command discovery and loading MCP Server SupportUnknownYes - MCP client and registryVerdict Code integrates Model Context Protocol servers Skills SystemNot availableYes - Skills Manager (5 microservices, 42 CLI commands)Phase 1-5 complete, production-ready Service IntegrationAnthropic API onlyGateway, Telemetry, Memory, Skills, RBACVerdict Code integrates with microservices architecture Model Provider SupportAnthropic onlyMulti-provider via Gateway (OpenAI, local, OpenRouter, etc.)Verdict Code abstracts provider differences Database IntegrationNot applicableYes - PostgreSQL (NeonDB), Neo4j, FAISSVerdict Code supports multiple datastores RBAC IntegrationNot applicableYes - role-based access controlVerdict Code has RBAC service Web UI IntegrationNot availableYes - HMI (Human-Machine Interface)Verdict Code has services/webui/hmi_app.py API Gateway IntegrationNot applicableYes - Local (6120) and Cloud (6123) GatewaysVerdict Code routes through Gateway for model access extensibility ModelClosed (Anthropic-controlled)Open - add custom commands, hooks, tools, skillsVerdict Code designed for extensibility Plugin ArchitectureNot availableYes - custom commands, MCP servers, skillsVerdict Code supports multiple extension mechanisms

Ecosystem

Feature/CategoryClaude CodeVerdict CodeNotes/Context LicenseCommercial (Anthropic)Open-source (in Verdict repo)Verdict Code is part of larger Verdict platform Development ModelClosed-source (Anthropic)Open-source with active developmentVerdict Code has frequent commits and feature additions DependenciesAnthropic SDKPython + Gateway servicesVerdict Code has more dependencies but more capabilities Testing InfrastructureInternal (Anthropic)pytest with unit and integration testsVerdict Code has test_concurrent_operations.py, test_services_integration.py, test_stress_tests.py BenchmarkingNot standardizedYes - CABF (Coding Agent Benchmark Framework)Verdict Code has standardized benchmark suite Model SupportClaude models onlyMulti-model via Gateway (Claude, GPT, local, etc.)Verdict Code supports any model in Gateway catalog Documentation StyleOfficial docsIn-repo SPECs, PRDs, howtos, quickrefsVerdict Code has comprehensive but technical docs Update MechanismVia pip/claude CLIVia git pullVerdict Code updates manually or via git Community ContributionsNot acceptedAccepted via GitHub PRsVerdict Code benefits from open-source community Commercial SupportAnthropic supportCommunity/self-supportedVerdict Code relies on community for support

Key Insights

Architectural Philosophy

Claude Code follows a vertically integrated approach: Anthropic controls the entire stack from model to CLI, ensuring optimized performance and user experience but limiting extensibility. This is ideal for teams that want a "just works" solution focused on Claude models. Verdict Code adopts a horizontally integrated architecture: Gateway abstracts model access, services provide specialized capabilities (memory, skills, telemetry), and the agent framework is decoupled from specific models. This enables multi-model strategies, cost optimization, and enterprise integration but requires more infrastructure.

Capability Gaps

Unique to Verdict Code:
  • Agentic Memory - Persistent context storage with graceful degradation
  • Sub-agents - Specialized agents for exploration, planning, and execution
  • Context Compaction - Automatic token management for long-running sessions
  • Skill Routing - Intelligent model selection based on task requirements
  • MCP Integration - Standard protocol for model context servers
  • Custom Commands - User and project-level command extensions
  • Multi-provider Support - Unified interface to OpenAI, Anthropic, local models, etc.
  • Cost Management - Per-task budget limits and credit tracking
  • Service Integration - Telemetry, RBAC, Memory, Skills services
  • Unique to Claude Code:
  • Polished UX - Optimized developer experience out of the box
  • Simplicity - No infrastructure beyond Python runtime
  • Official Support - Backed by Anthropic
  • Claude Optimization - Tailored specifically for Claude models
  • Performance Considerations

    The CABF adapters reveal implementation differences:

  • Verdict Code adapter (621 lines) uses direct Python integration with the Agent class, enabling richer telemetry and control
  • Claude Code adapter (353 lines) uses subprocess execution, treating Claude Code as a black box
  • This suggests Verdict Code is better suited for:

  • Deep integration into custom tools
  • Fine-grained performance monitoring
  • Complex multi-agent workflows
  • Claude Code is better suited for:

  • Quick setup and immediate productivity
  • Teams that don't need infrastructure overhead
  • Standardized Claude-focused workflows
  • Cost Implications

    Verdict Code's Gateway architecture enables:

  • Cost optimization via skill routing (cheaper models for simple tasks)
  • Budget enforcement via max_cost_credits per task
  • Zero-cost options via local models (Ollama) with credit_multiplier=0.0
  • Unified billing across multiple providers
  • Claude Code's costs are:

  • Simpler - single Anthropic bill
  • Predictable - standard Anthropic pricing
  • Limited - no cost optimization strategies

  • Recommendation Matrix

    Choose Claude Code When:

    ScenarioRationale Team already using ClaudeNo need for multi-model support Minimal infrastructure desiredJust install CLI and start coding Standard coding tasksBug fixes, features, refactoring within single repo No custom integrations neededHappy with Anthropic's toolset Priority is simplicity over flexibilityWant polished UX without configuration Budget allows Anthropic-only costsNot concerned about cost optimization Small team or individual developerDon't need enterprise features Short-term projectsDon't need persistent memory or session management

    Choose Verdict Code When:

    ScenarioRationale Multi-model strategy requiredNeed to mix Claude, GPT, local models based on task Cost optimization is priorityWant to use cheaper models for simple tasks Enterprise deploymentNeed RBAC, telemetry, unified billing Custom infrastructure integrationNeed to integrate with existing services (databases, APIs) Long-running agent sessionsNeed context compaction and memory persistence Multi-agent coordinationNeed specialized sub-agents for different tasks Custom tool developmentNeed to extend agent capabilities with domain-specific tools Local model supportWant to use Ollama/Llama for zero-cost inference Standardized benchmarkingNeed CABF for objective performance measurement Open-source requirementNeed to inspect/modify agent code Skill managementNeed reusable skill definitions across agents MCP server integrationNeed to integrate external model context services Complex workflow orchestrationNeed custom commands, hooks, and service coordination Production AI platformBuilding custom AI development platform

    Hybrid Approach

    For teams with diverse needs:

  • Use Claude Code for individual developer productivity
  • Use Verdict Code for:
  • Automated pipelines with cost constraints
  • Multi-agent workflows requiring coordination
  • Integration with existing enterprise systems
  • Benchmarking and performance optimization
  • Local development with offline models

  • Conclusion

    Claude Code and Verdict Code serve different segments of the AI-assisted development market. Claude Code excels as a polished, focused tool for Claude-centric workflows with minimal setup overhead. Verdict Code provides a comprehensive framework for building sophisticated AI development platforms with multi-model support, enterprise integration, and advanced agent coordination capabilities.

    The choice depends on project requirements, team expertise, infrastructure constraints, and long-term scalability needs. For organizations investing in AI-augmented development at scale, Verdict Code's flexibility and extensibility may justify the additional complexity. For individual developers or teams focused on immediate productivity with Claude models, Claude Code offers a streamlined path to AI-assisted coding.

    The CABF framework enables objective, data-driven comparison through standardized benchmarks, allowing teams to measure actual performance rather than relying on feature lists alone. This empirical approach is recommended for decision-making, especially when agent performance on specific task types is a critical factor.


  • CABF Specification: docs/SPECs/SPEC_Coding-Agent-Benchmark.md
  • Verdict Code SPEC: docs/SPECs/SPEC_Verdict-Code.md
  • Agent Adapters: tools/coding-agent-benchmark/cabf/agents/
  • Verdict Code Agent: tools/verdict-code/verdict_code/agent.py
  • Skills Manager SPEC: docs/SPECs/SPEC_Verdict_Skills_Manager.md
  • Gateway Documentation: docs/P0_END_TO_END_INTEGRATION.md

  • Document Version: 1.0 Last Modified: 2026-01-14 Next Review: 2026-02-14 or upon significant framework updates

    Related Research