The Role System — Architecture
This marketplace grew from a set of independent skills into a small system: three orchestrators — dev-crew, brainstorm-panel, and research-sweep — over one shared substrate, the roles plugin (four plugins total). They all share one underlying abstraction, the evolving role. This page explains that abstraction end to end: the problem it solves, the file model, how roles evolve without corrupting each other, and how everything degrades gracefully when you install only part of it.
The idea in one sentence
The role is the unit of reuse; the orchestrators are just contexts a role runs in. One persona — say a senior debugger — can run several ways without being redefined: invoked solo for a quick pass, or seated into any of the three orchestrators (as a gated crew role inside a delivery relay, a panel lens in a brainstorming session, or a research verifier in a coverage sweep). Its identity lives in one place and its hard-won lessons accumulate there, regardless of which context produced them.
The roles plugin is the substrate — one role, one solo pass; the three orchestrators each compose a
task-fit roster of those roles toward a different end:
| Mode | Plugin | Unit of work | End |
|---|---|---|---|
Solo |
|
one role, one pass, one context |
the substrate (cheapest, instant) |
Panel |
|
many roles in parallel, critique + converge |
decide |
Relay |
|
many roles, gated pipeline, file handoffs |
deliver |
Sweep |
|
many roles fan-out, synthesize + verify |
discover |
The three orchestrators (and what roles is)
roles is not an orchestrator — it is the substrate: the shared, evolving talent pool plus solo
invocation (/roles:as). It’s the noun the three verbs operate on. Each orchestrator composes a
task-fit roster of those roles, but they differ on every axis below.
| Axis | brainstorm-panel — decide | dev-crew — deliver | research-sweep — discover |
|---|---|---|---|
Produces |
a judgment / decision (advisory, no artifact shipped) |
a shipped target, gated |
verified, cited findings |
Roles relate by |
disagreement — the clash is the point |
handoff — sequential, each builds on the last |
independence — disjoint coverage, no clash |
Flow |
parallel diverge → converge |
sequential gated relay |
parallel fan-out → synthesize + verify |
Compose roles from |
quality axes (perspectives) |
the delivery target (functions) |
the information space (coverage angles) |
Guards against |
groupthink / blind spots (unanimity = red flag) |
shipping broken / unverified work |
incomplete coverage + unverified facts |
Two properties tie them together:
-
They chain. research (discover the facts) → panel (decide what to do) → crew (deliver it).
-
They share roles. The
skepticis a panel seat, a crew adversarial check, and a research fact-verifier — one evolving persona, three contexts. That cross-context reuse is exactly what the shared core (below) exists for; research-sweep is the third consumer that proves it.
The problem it solves
Before this system, each orchestrator had exactly half of the right mechanism:
| Formation (how the team is picked) | Evolution (do roles improve?) | |
|---|---|---|
brainstorm-panel |
✅ Dynamic — seats derived from the task |
❌ Ephemeral — every seat re-invented cold each run; accumulated wisdom buried in log prose |
dev-crew |
❌ Static — category → fixed lineup lookup |
✅ Roles persist with learnings, model tiers, a probationary→stable lifecycle |
Worse, the same persona could live in both worlds with unconnected lessons — an art-historian role that learned one thing as a crew member and another as a panel seat, its knowledge split across two files that never talked. The role system gives both orchestrators both halves: dynamic formation and evolving roles, over one shared talent pool.
The file model — everything under .claude/roles/
A per-repo directory is the single home. It is created in the repository (not in a plugin’s install
directory) for a hard reason: a marketplace-installed plugin’s files are a read-only cache
(~/.claude/plugins/cache/…), so a plugin can never evolve a registry that lives next to its own skill.
The registry must live in the repo.
.claude/roles/
crew.md # dev-crew's role registry (one writer: dev-crew)
panel.md # brainstorm-panel's registry (one writer: brainstorm-panel)
research.md # research-sweep's role registry (one writer: research-sweep)
registry.md # auto-generated index of shared core roles (written by the roles plugin hook)
<role>.md # shared core role files (one writer: the roles plugin / user-gated graduation)
The defining property: every file has exactly one writer. No write contention, no lane-violation risk, no schema drift between plugins — by construction, not by convention.
Two layers
| Layer | What it is |
|---|---|
Layer 1 — local registries (unconditional) |
Each orchestrator keeps its own registry ( |
Layer 2 — shared core (optional, the |
Shared |
The shared core role file
Written only by the roles plugin (and user-gated graduations):
## Charter ← one-sentence mandate (keeps the role in lane everywhere)
## When to use ← trigger axes; consumers match against these to seat the role
## Body ← the full persona method + deliverables
## Learnings (core) ← context-INDEPENDENT lessons; arrive only by GRADUATION, never direct append
## Learnings (solo) ← lessons from /roles:as runs (free-append)
The consumer rows (the "annexes")
Context-specific bindings and lessons are not sections of the shared file — they are each consumer’s own registry row, so the single-writer rule holds:
| Lane | Lives in | Holds |
|---|---|---|
Crew |
a row in |
model tier, tool scope, handoff contract, crew-specific learnings |
Panel |
a row in |
lens emphasis, pairing notes, panel-specific learnings |
Research |
a row in |
coverage angle, dedup / verification notes, research-specific learnings |
Solo |
the core file’s |
solo-run lessons |
A row with no role: link is a purely local role — that is exactly how an orchestrator behaves when the
roles plugin isn’t installed.
Why lanes — shared identity, lane-scoped evolution
Sharing everything would be a bug, because the two orchestrators teach a role different kinds of lessons:
-
Crew teaches procedural lessons: "write your handoff to the run dir, don’t return text", "implement the contract, flag don’t absorb scope." Useful inside a gated relay; meaningless or wrong elsewhere.
-
Panel teaches epistemic lessons: "judge by title + depicts, never the slug", "push back — disagreement is the point." Useful as a critique lens; directly contradicts crew’s "implement the contract" if applied in a relay.
Merged naively these contaminate each other (a solo run obeying run-dir procedures that don’t exist; a crew dev-phase adopting panel-style divergence that violates its contract). So procedural lessons stay in their lane. But context-independent knowledge — "title + depicts, never slug" is true everywhere — belongs to everyone. Moving that, and only that, to the shared core is the job of graduation.
Three evolution rules
-
Free-append only to your own lane. An invocation loads the shared core (if linked) plus its own row — never another consumer’s.
-
Core learnings arrive by graduation, never direct append. When a lesson appears in two lanes, or is plainly context-independent, it is promoted to
## Learnings (core)and struck from the rows. This is the same append → graduate → prune loop the evolving-claude-md skill uses for CLAUDE.md, one level down. Therolesplugin’s SessionStart hook surfaces candidates (a role used by 2+ of crew / panel / research, or a bloated solo-learnings section); it never rewrites a role file itself. -
Identity edits are deliberate. Either orchestrator may propose a Charter or Body change; only the user applies it. A panel run silently rewriting the persona that crew will execute tomorrow is the one genuinely dangerous channel, so it is gated — and everything is in git, so every change is reviewable.
The no-downgrade principle
The guarantee that makes partial installs safe: a capability gates on the shared registry only if it intrinsically requires sharing. Composing a roster from a task’s axes, the phase-gate hook, qa hardening, the escalation ladder — none of these need a shared pool, so they ship unconditionally in the 1.1.0 skills. The shared core’s exclusive value is only what sharing actually enables: cross-context learning, solo invocation, and one talent pool. Neither orchestrator is ever second-class standalone.
| Installed | Formation | Evolution | What the shared core adds when present |
|---|---|---|---|
crew alone |
✅ dynamic compose path mints into |
✅ full loop in |
cross-skill learning; one pool |
panel alone |
✅ dynamic, as before |
✅ |
seats gain shared, evolving identity |
research-sweep alone |
✅ dynamic angle composition into |
✅ |
cross-context learning + the shared verifier ( |
roles alone |
— |
per-role solo annex + graduation audit |
solo invocation of any role |
Crew’s escalation protocol
Delivery relays need a defined answer to "a role is stuck." dev-crew 1.1.0 adds one, unifying the old
qa-loop, the debugger/lead candidate roles, and the user’s "go back" steering into a single
mechanism.
The BLOCKED handoff
The missing primitive: a role that cannot meet its done-criteria writes its handoff with
status: BLOCKED plus what it tried, why it’s stuck, what it needs, and a suggested escalation target.
Deliver or declare — silent flailing (or confident-but-wrong output) is a defect. This pairs with the
phase-gate hook (scripts/check-handoffs.py, a PreToolUse hook): the hook accepts a BLOCKED
handoff as valid and routes it to the ladder, while still rejecting a missing one. Prompt discipline
drifts; hooks don’t.
The ladder (conductor-owned, each rung once per stumble)
-
Clarify & retry — re-delegate the same role with the missing context (one retry).
-
Re-tier vs 3. Re-role — a diagnosis, not a sequence (below).
-
(see 2)
-
Re-plan — when the contract itself is wrong, escalate up the relay to the architect; downstream artifacts are marked stale (a role-initiated "go back").
-
User — the ladder is exhausted, the issue is a genuine user decision (scope/topology forks skip straight here), or a cost gate fires.
Re-tier vs re-role: read the BLOCKED report
| Diagnosis | Symptom | Action |
|---|---|---|
Capability gap |
role is right, the model is short — real progress, repeated near-misses, work exceeds the tier’s depth |
Re-tier the same role (e.g. sonnet→opus). A different role would hit the same ceiling. |
Ownership gap |
the model is fine, the role is wrong — doing work its charter doesn’t own (dev looping on root-cause is the debugger’s job; cross-subsystem → lead), or needs tools its scope denies |
Re-role to the failure-class owner (mint probationary via the compose path if missing). A heavier model here is a more expensive flail. |
Continuity heuristic: approach sound but execution short → re-tier (preserves artifacts, changes one
variable); approach itself suspect → re-role (fresh method). Unclear → re-tier first, which keeps lane
discipline (jumping straight to lead imports scope creep). Every escalation is logged (escalation: in
the run entry), which feeds the learning loop: repeated rung-2 hits are evidence for a permanent re-tier;
recurring rung-3 hops to a missing owner are the trigger to mint a new role.
Using it
| Want to… | Do |
|---|---|
Run one expert pass, fast |
install |
Build & ship something with gates |
install |
Decide what / whether; stress-test a plan |
install |
All of the above, sharing one evolving talent pool |
install all three — registries link to shared cores automatically |
Field position
No surveyed public project has roles that accumulate experience across orchestration contexts. Large
agent catalogs (VoltAgent’s 154, wshobson’s 192) are static libraries; dynamic-selection panels learn
nothing between runs; the one pack that fuses prompts with a learning loop stores it in an opaque global
database. Dynamic formation plus lane-scoped evolving roles, shared across solo / crew / panel as
reviewable per-repo markdown, is — as of this writing — unique. See the full survey and rationale in the
project’s decision record, docs/decisions/2026-06-12-ecosystem-review.md.