Parallel Research Sweep

Tackles a broad, "exhaustive" research or cataloging question by partitioning it into non-overlapping angles, launching parallel research agents under one identical output contract, merging their results straight from the agent transcripts via jq (without dumping the JSONL back into context), and adversarially verifying the synthesized findings. Built for coverage that is exhaustive, not representative — and for doing it without burning output tokens re-writing every agent’s result.

Install

/plugin install parallel-research-sweep@alexmskills

Trigger it

/parallel-research-sweep:parallel-research-sweep every open-source vector database

Or ask in natural language: "give me an exhaustive catalog of public datasets for X" or "find as many open-source tools for Y as possible".

When to use it

  • "as many as possible", "exhaustive", "comprehensive", "every X", "all known …"

  • Catalogs, surveys, and landscapes — "all open-source tools for X", "every public dataset for Y"

  • Large-scale gathering where one agent’s coverage would be insufficient

  • Skip it for 5–20 items, a narrow domain one agent can cover, or data already in the repo

What it does

The method has two halves — fan-out and verification — and skipping the second produces a large but unreliable dataset.

Lock a schema, then partition

Commits to an exact output schema (format, top-level list, ≤5 required fields, ID disambiguation, no fabrication) before launching, and pastes it verbatim into every agent prompt so outputs merge cleanly. Then splits the topic into 5–7 disjoint slices (by source, era, geography, category, or corpus) so each agent owns an angle no other touches.

Fan out in parallel

Launches the agents in a single message with run_in_background: true, each prompt carrying the verbatim schema, its slice, source guidance, verification rules, a YAML-only output requirement, and a numeric volume floor. Handles the expected 1–2 partial failures by re-running, accepting partial output, or merging slices.

Merge without context blowup

Extracts each agent’s final message from its JSONL transcript with a jq + awk pipeline that strips the YAML fence and redirects to one catalog file — never Read-ing the transcripts. Sanity-checks with wc/grep and commits immediately so a compaction can’t lose the result.

Adversarially verify

Runs a fresh verifier agent that didn’t produce the data: spot-checks a 10–20% sample against primary sources, hunts for fabricated URLs/dates/IDs, checks cross-agent seams for duplicates, and tests the volume claim and gaps. Dedup, normalization, and URL verification follow as separate passes.

Notes

  • Key anti-patterns: re-outputting the whole catalog via Write, reading agent JSONL directly, reusing agent IDs across reruns, skipping the volume target, and skipping the verification half.

  • Agents without web access fabricate plausible-looking URLs and facts — verification is mandatory, not optional.