> Also available as a PDF: "co-op: The Invisible Switchboard".
>
> Source markdown lives in the Verdict project at docs/WHITEPAPERS/whitepaper_co_op_agent_coordination_skill.md; the canonical PDF is archived at docs/WHITEPAPERS/best/co-op_Invisible_Switchboard.pdf.
co-op: A Low-Token Coordination Layer for Multi-Agent Coding Harnesses
Version: 1.0
Status: Draft
Topic: Low-token coordination infrastructure for multi-agent coding harnesses
Abstract
As AI coding agents become more capable, a new failure mode is emerging: not individual agent incompetence, but multi-agent coordination collapse. Multiple agents can now operate inside the same repository, branch family, or coding harness, but they often lack a shared, low-cost mechanism for discovering one another, coordinating tasks, avoiding file conflicts, resolving blockers, and handing off work.
Naïve solutions resemble chat rooms, shared Markdown logs, or task boards. These approaches are insufficient for AI agents because they either consume too many tokens, require agents to perform inefficient filesystem searches, or fail to enforce bounded interaction. Human collaboration tools are optimized for human attention. Coding agents need something different: a compact, queryable, temporary coordination substrate with strict token controls.
co-op proposes such a substrate. It is a Python-mediated, SQLite-indexed, file-mirrored coordination skill for AI coding agents. Agents interact through a simple/co-op interface while the Python engine performs all heavy lifting: discovery, joining, leaving, token-capped reads, cursor tracking, file/task claims, bounded live threads, blocker/decision tracking, handoff generation, and human-readable exports.
The core thesis is simple:
> Multi-agent coding does not need more chat. It needs air-traffic control.
1. Introduction
AI coding tools are moving from single-agent assistance toward multi-agent collaboration. A developer may soon have several agents working in parallel:
This architecture promises major speedups, but it also introduces a coordination problem. Agents can collide on files, duplicate work, miss decisions, forget active blockers, or continue outdated assumptions. The more capable the agents become, the more dangerous uncoordinated parallelism becomes.
Humans solve this with meetings, Slack, Jira, Git branches, code reviews, and tribal memory. Agents cannot rely on those mechanisms directly. They are bounded by context windows, token costs, imperfect memory, and limited attention. Giving each agent a long shared transcript to read is not coordination; it is waste.
The co-op feature is designed to solve this problem at the harness level.
A co-op is a temporary coordination space for a small group of agents working on the same task, sprint, repo, or codebase region. Most co-ops are expected to contain two or three agents, with support for up to ten. A co-op normally lives for a day, may live for several days, and should rarely last longer than a week.
The goal is not to create a permanent knowledge base. The goal is to create a lightweight, tactical, low-token coordination layer for active coding work.
2. The Coordination Problem in Agentic Coding
2.1 The single-agent assumption is breaking
Many coding harnesses implicitly assume one agent is working on one coherent task. Even when tools support subagents or parallel sessions, coordination is often informal. The operator may manually tell each agent what others are doing, or agents may infer state from Git, files, logs, and prompts.
This does not scale.
Once multiple agents operate in the same repo, the system needs answers to basic questions:
Without a coordination layer, each agent must rediscover this state independently. That creates token waste and operational risk.
2.2 Human collaboration tools are the wrong abstraction
It is tempting to reuse human tools: chat, issue trackers, Markdown logs, shared docs, or project boards. These are useful for people, but they are poor primary coordination substrates for LLM agents.
A human can skim a Slack channel. An agent consumes tokens for every line it reads.
A human can infer that an old message is irrelevant. An agent may overweight stale context.
A human can ask, "Did anyone touch this file?" An agent may search the repo, inspect Git, read logs, and still miss the real answer.
A human chat room is optimized for social communication. An agent coordination layer must be optimized for:
The co-op design therefore rejects the idea that the primary artifact should be a shared conversation transcript.
2.3 The token problem is the central constraint
The critical requirement is not merely that agents can communicate. They must communicate with very low overhead.
The target is:
> co-op should consume less than 5% of total agent token usage.
This requirement eliminates most obvious designs. A Markdown transcript, even if easy to implement, will grow rapidly. Agents reading or grepping it repeatedly will burn context and cost. Even if the raw storage is file-based, the access pattern must behave like SQL: selective, indexed, budgeted, and relevant.
The co-op system therefore makes Python, not the agent, responsible for retrieval.
3. Design Philosophy
co-op is based on six design principles.
3.1 Python does the heavy lifting
Agents should not perform filesystem archaeology. They should not grep event folders, scan transcripts, parse raw JSON files, or write coordination state by hand.
Agents call:
/co-op
The /co-op command calls the Python engine. The Python engine validates inputs, queries SQLite, updates files, enforces budgets, and returns a compact answer.
This turns the agent into a client, not the database engine.
3.2 SQLite is the operational authority
SQLite provides the selective retrieval needed to keep tokens low. It can answer questions such as:
The system can still mirror all events into files for transparency and recovery. But normal agent reads should come from Python-mediated SQLite queries, not raw file scans.
3.3 Files are audit and export artifacts
File-based artifacts are still important. They provide:
But files are not the normal operational read path. This is a crucial distinction.
3.4 Conversation is bounded
Agent-agent live communication is valuable, but dangerous if unbounded. LLM agents can easily turn a small clarification into a long recursive debate.
co-op supports live threads, but each thread has strict limits:
The live thread is a tactical exchange, not a chat room.
3.5 Status is hot, history is cold
Agents usually need current state, not full history.
The co-op engine maintains compact hot-state summaries:
Historical events remain available, but they are cold storage. They are queried only when needed.
3.6 Leaving is as important as joining
Agent exits are a major source of lost work. Agents may stop because of context limits, operator interruption, tool failure, completion, or timeout.
A co-op must make leaving cheap and safe. /co-op leave automatically writes a handoff, releases or expires claims, preserves blockers, updates summaries, and marks the agent inactive.
The principle is:
> No agent should disappear without leaving behind a useful next-action packet.
4. System Overview
At a high level, co-op has four layers:
Agent or operator
↓
/co-op slash command or co-op CLI
↓
Python co-op engine
↓
SQLite operational store + file artifact mirror
The agent sees only the command surface. The Python engine owns state management.
4.1 Command surface
Representative commands include:
/co-op start
/co-op discover
/co-op join
/co-op read
/co-op update
/co-op claim
/co-op ask
/co-op decide
/co-op thread start
/co-op thread reply
/co-op leave
/co-op export
The command surface is intentionally small and mnemonic.
4.2 Python engine
The Python engine handles:
This is the real product. The slash command is only a thin interface.
4.3 SQLite store
SQLite stores operational state:
SQLite gives co-op its SQL-like precision and low-token retrieval behavior.
4.4 File mirror
The file mirror provides transparent artifacts:
.co-op/
co-op.db
co-op_config.yaml
co-op_registry.json
co-op_hot.md
co-op_status.json
co-op_transcript.md
spaces/
co-op_2026-05-03_09-00-00_hmi-settings-flow/
co-op_space.json
co-op_hot.md
co-op_status.json
co-op_tasks.json
co-op_claims.json
co-op_blockers.json
co-op_threads.json
files/
events/
threads/
exports/
handoffs/
File names use the co-op_ prefix for clarity and easy sorting.
5. Discovery: Finding Available Co-ops
A multi-agent system needs a simple answer to the question:
> What coordination spaces are available to join?
The /co-op discover command lists active, idle, stale, and closed co-ops in the current repo.
Example:
Available co-ops:
hmi-settings-flow active 2 agents updated 4m ago
Goal: Add settings API and UI flow.
image-pricing-prd idle 1 agent updated 2h ago
Goal: Draft image pricing PRD.
Default: hmi-settings-flow
Discovery is not a transcript search. It is a compact metadata query. The agent receives only enough information to choose a co-op.
Statuses are defined operationally:
active: updated within last 30 minutes or active live thread exists
idle: no update for 30 minutes to 2 hours
stale: no update for 2 hours to 6 hours
closed: explicitly closed or expired
expired: older than configured max lifetime
If exactly one active co-op exists, /co-op join can skip discovery and join directly. If multiple active co-ops exist, the agent receives a compact numbered list and may join with /co-op join 1.
This keeps operator typing low while preserving safety in ambiguous cases.
6. Minimal-Typing Lifecycle
The co-op system is designed to minimize operator burden. The ideal workflow is one command to start, one command to join, and one command to leave.
6.1 Starting a co-op
Preferred start:
/co-op start "Update HMI settings flow"
Ultra-minimal start:
/co-op start
If no title is supplied, the Python engine may infer the co-op slug from the current Git branch, working directory, project name, or fallback to general-work.
The engine creates:
.co-op/ folder if needed.It then returns a compact invite instruction:
Created co-op: hmi-settings-flow
To join from any agent in this repo:
/co-op join
6.2 Joining a co-op
Minimal join:
/co-op join
If exactly one active co-op exists, the agent joins it. The engine auto-detects branch, tool identity, working directory, and assigns a visible name.
Visible agent names follow the pattern:
Examples:
Sky hmi-settings-flow
Charlie hmi-settings-flow
River hmi-settings-flow
The second word ties the agent to the co-op. The first word is lightweight and human-friendly.
Internally, the engine assigns a stable ID so visible name collisions do not matter.
6.3 Inviting other agents
The operator can run:
/co-op invite
The system returns a pasteable block:
Paste this into each agent:
/co-op join
/co-op read
This is important. The operator should not have to explain the current coordination structure repeatedly.
6.4 Leaving a co-op
Minimal leave:
/co-op leave
The engine automatically:
Optional reason:
/co-op leave "context limit"
Leaving should be easier than silently disappearing.
7. Token Control Architecture
The central technical challenge is token control.
A file-based system alone does not solve token usage. A raw file system can be worse than SQL if agents are allowed to inspect everything. The token savings come from the Python query layer.
7.1 The read firewall
Agents are instructed not to read raw state:
Forbidden during normal operation:
cat .co-op/transcript.md
grep -R .co-op
cat .co-op/events/*.json
direct SQLite queries
manual JSON edits
Allowed:
/co-op discover
/co-op status
/co-op read
/co-op thread read
/co-op task list
/co-op blocker list
This is the read firewall. It ensures all agent-visible state is curated, budgeted, and relevant.
7.2 Cursor-based deltas
Each agent has a cursor. The engine tracks what the agent has already seen.
When the agent calls:
/co-op read
Python returns:
This prevents repetitive rereading.
7.3 Hot-state summaries
The engine maintains generated summaries such as:
co-op_hot.md
co-op_tasks.json
co-op_claims.json
co-op_blockers.json
co-op_threads.json
These are small materialized views. They are generated by Python, not manually edited by agents.
7.4 Output caps
Commands have default and hard token budgets.
Representative caps:
/co-op discover: default 400, hard 700
/co-op status: default 500, hard 800
/co-op read: default 800, hard 1,200
/co-op thread read: default 400, hard 700
/co-op update response: default 150, hard 250
/co-op handoff: default 800, hard 1,500
If a result exceeds budget, Python must summarize or request a narrower query.
7.5 Why SQL matters
SQLite is not included for storage convenience. It is included because low-token coordination requires selective retrieval.
A good co-op read behaves like:
SELECT active claims
SELECT unresolved blockers
SELECT messages where recipient = this agent
SELECT events since this agent's cursor
SELECT decisions made after last_seen
The agent receives the result, not the database.
This is how a file-backed system can still behave like a low-token SQL system.
8. Coordination Primitives
co-op includes a small set of primitives tailored to coding work.
8.1 Check-ins
Check-ins are structured status updates. They include:
Check-ins are useful, but they should not be required for every event. A blocker, claim, or live-thread message should have its own lightweight schema.
8.2 Tasks
Tasks are lightweight work items. They are not meant to replace a full project manager.
Commands:
/co-op task add "Implement settings API"
/co-op task claim TASK-004
/co-op task done TASK-004
/co-op task list
8.3 Claims
Claims prevent collisions. An agent can claim a file, directory, branch, task, or conceptual component.
Examples:
/co-op claim services/hmi/settings_api.py
/co-op claim frontend/settings/ImageSettings.tsx --ttl 45m
/co-op release services/hmi/settings_api.py
Claims have TTLs so stale agents do not block work indefinitely.
The Python engine detects overlap and warns agents before they collide.
8.4 Blockers
Blockers are unresolved questions or impediments.
Example:
/co-op ask "Should settings persist to JSON or SQLite?"
Blockers remain visible in hot state until resolved.
8.5 Decisions
Decisions resolve blockers and prevent repeated debate.
Example:
/co-op decide BLOCK-003 "Use project-local JSON for v1; defer SQLite."
Decisions are durable and included in summaries when relevant.
8.6 Handoffs
Handoffs are structured exit packets.
A handoff includes:
Handoffs are generated automatically by /co-op leave and may also be written explicitly with /co-op handoff.
9. Bounded Live Communication
Live communication is a necessary feature, but it must not become a token sink.
9.1 Purpose
Live threads are for short tactical exchanges:
They are not for long architectural debates.
9.2 Limits
Default limits:
Max messages: 6 total
Max token budget: 800 total
Max lifetime: 30 minutes
Participants: 2 default
Hard limits:
Max messages: 10 total
Max token budget: 1,000 total
Max lifetime: 60 minutes
Participants: 4 max
The recommended interpretation is 10 total messages, not 10 back-and-forth pairs.
9.3 One-minute loop during live threads
When an agent-agent live thread starts, participating agents drop to a one-minute loop interval until the thread terminates.
Rule:
If this agent is a participant in an active live thread:
loop interval = 1 minute
Else:
use normal adaptive cadence
Only participants poll at the one-minute cadence. Non-participants remain on normal cadence.
9.4 Thread lifecycle
Threads move through states:
open
waiting
resolved
expired
summarized
archived
When a thread closes, Python writes a compact summary:
{
"type": "thread_summary",
"thread_id": "THREAD-003",
"topic": "Persistence choice",
"outcome": "Use project-local JSON for v1. Defer SQLite until multi-profile support.",
"decisions": ["DEC-002"],
"open_items": [],
"tokens_est": 742,
"message_count": 7
}
Agents normally read the summary, not the full message history.
10. Looping and Activity Management
The /co-op loop command supports periodic coordination checks.
Default:
/co-op loop --period 30m --adaptive
Cadence states:
NORMAL_LOOP every 30 minutes
ACTIVE_LOOP every 10 minutes
LIVE_THREAD_LOOP every 1 minute
STALE_LOOP every 2 hours
STOPPED after 6 hours no meaningful updates
Meaningful updates include:
Non-meaningful updates include:
After six hours without meaningful updates, the loop writes a summary/handoff and stops.
The loop should read often enough to remain useful, but write only on meaningful state changes.
11. Storage and Recovery
The recommended authority model is:
SQLite = operational authority
Files = mirrored audit/export artifacts
However, the system must also support recovery:
/co-op rebuild-index
This command rebuilds SQLite from file artifacts if the database is lost or corrupted.
11.1 Write process
Every write follows a disciplined process:
If mirroring fails, the system must either roll back or mark the event as file_sync_failed for repair.
11.2 Why mirror files at all?
Because agents and operators benefit from transparent artifacts:
The mirror is not redundant. It is operationally secondary but strategically valuable.
12. Security and Safety Considerations
co-op is a local coordination layer, but it still needs guardrails.
12.1 Prevent accidental state corruption
Agents must not manually edit SQLite or raw co-op files. All mutation goes through Python.
12.2 Prevent token blowups
All command outputs are capped. Full exports are written to files, not returned into the agent context.
12.3 Prevent stale claims
Claims require TTLs and should expire automatically.
12.4 Prevent live-thread runaway
Threads have hard message, token, participant, and lifetime caps.
12.5 Prevent ambiguous joins
If more than one active co-op exists, /co-op join should require a numbered selection rather than guessing silently.
13. Example Workflow
A human operator starts a co-op:
/co-op start "Update HMI settings flow"
The system replies:
Created co-op: hmi-settings-flow
To join from any agent in this repo:
/co-op join
Agent A joins:
/co-op join
/co-op read
/co-op claim services/hmi/settings_api.py
Agent B joins:
/co-op join
/co-op read
/co-op claim frontend/settings/ImageSettings.tsx
Agent A asks a tactical question:
/co-op thread start @Charlie "Persistence choice"
The thread participants drop to one-minute loop cadence.
Agent B replies:
/co-op thread reply THREAD-003 "JSON is fine for v1 if backend exposes GET/PUT."
The thread closes:
/co-op thread close THREAD-003 --summary "Use project-local JSON for v1."
Agent A leaves:
/co-op leave "backend done"
The system writes a handoff, releases claims, updates summaries, and marks Agent A inactive.
At the end of the sprint, the operator exports:
/co-op export md
The result is a human-readable transcript and summary of the co-op.
14. Comparison to Alternative Approaches
14.1 Shared Markdown log
A shared Markdown log is easy to implement but poor as a primary agent interface. It grows indefinitely, encourages full-context rereads, has weak concurrency semantics, and requires agents to parse unstructured history.
co-op uses Markdown only as an export and hot-summary format, not as the operational source.
14.2 Pure SQL database
A pure SQL database is excellent for retrieval but less transparent to humans and less portable as a visible audit artifact.
co-op keeps SQL for operations and files for audit/export.
14.3 Human chat tool
A chat tool enables communication but does not enforce token budgets, claims, decisions, handoffs, or lifecycle expiration.
co-op treats chat as a bounded live thread inside a coordination system.
14.4 Git-only coordination
Git shows what changed, but not intent, live claims, blockers, decisions, or handoffs. Git is necessary but insufficient.
co-op complements Git by coordinating before and during edits.
14.5 Full project management system
A full PM system is too heavy for ephemeral agent coordination. It may be useful for humans, but it is not optimized for low-token agent reads.
co-op is sprint-local, temporary, and tactical.
15. Why co-op Matters
The next major productivity jump in AI coding will not come only from smarter individual models. It will come from better orchestration of multiple imperfect agents.
But multi-agent systems fail when coordination costs exceed the benefit of parallelism. If every agent must read long logs, ask the operator for state, or infer what others are doing, parallelism becomes noise.
co-op reduces that coordination cost by making active shared state:
It gives agents just enough shared awareness to work together without drowning in context.
The product intuition is:
> The coordination layer should be smaller than the work it coordinates.
That is the essence of co-op.
16. Future Directions
16.1 Merge-risk radar
The engine can analyze active claims, changed files, and branch names to detect likely merge conflicts before they happen.
16.2 Agent capability profiles
Agents may declare strengths, such as frontend, backend, tests, docs, refactor, or security review. The engine can suggest task allocation accordingly.
16.3 Push notifications
Polling is sufficient for MVP, but future versions may support local WebSocket, SSE, or filesystem event notifications.
16.4 Dashboard
A small local dashboard could show active agents, claims, blockers, decisions, and threads.
16.5 Cross-repo co-ops
Future versions may support co-ops spanning multiple repositories, though this increases complexity significantly.
16.6 Post-sprint analytics
Exports could feed DuckDB or analytics tools to answer questions such as:
17. Conclusion
co-op is a small but important primitive for the next phase of AI coding systems.
It recognizes that multi-agent coding does not need an unbounded shared conversation. It needs a disciplined coordination substrate: one that lets agents discover each other, join quickly, claim work, resolve blockers, exchange bounded live messages, leave cleanly, and preserve human-readable records.
The architecture is intentionally practical:
/co-op command
→ Python engine
→ SQLite operational store
→ file-based audit/export mirror
→ token-capped response
This gives the system the query precision of SQL, the transparency of files, and the simplicity of a slash command.
The result is a design-time tool that allows small groups of coding agents to coordinate effectively without consuming the very context they need to do the work.
In short:
> co-op is not agent chat.
> co-op is low-token air-traffic control for coding agents.