Parallel Research Sweep
Tackles a broad, "exhaustive" research or cataloging question by partitioning it into non-overlapping angles, launching parallel research agents under one identical output contract, merging their results straight from the agent transcripts via jq (without dumping the JSONL back into context), and adversarially verifying the synthesized findings. Built for coverage that is exhaustive, not representative — and for doing it without burning output tokens re-writing every agent’s result.
Trigger it
/parallel-research-sweep:parallel-research-sweep every open-source vector database
Or ask in natural language: "give me an exhaustive catalog of public datasets for X" or "find as many open-source tools for Y as possible".
When to use it
-
"as many as possible", "exhaustive", "comprehensive", "every X", "all known …"
-
Catalogs, surveys, and landscapes — "all open-source tools for X", "every public dataset for Y"
-
Large-scale gathering where one agent’s coverage would be insufficient
-
Skip it for 5–20 items, a narrow domain one agent can cover, or data already in the repo
What it does
The method has two halves — fan-out and verification — and skipping the second produces a large but unreliable dataset.
Lock a schema, then partition
Commits to an exact output schema (format, top-level list, ≤5 required fields, ID disambiguation, no fabrication) before launching, and pastes it verbatim into every agent prompt so outputs merge cleanly. Then splits the topic into 5–7 disjoint slices (by source, era, geography, category, or corpus) so each agent owns an angle no other touches.
Fan out in parallel
Launches the agents in a single message with run_in_background: true, each prompt carrying the verbatim schema, its slice, source guidance, verification rules, a YAML-only output requirement, and a numeric volume floor. Handles the expected 1–2 partial failures by re-running, accepting partial output, or merging slices.
Merge without context blowup
Extracts each agent’s final message from its JSONL transcript with a jq + awk pipeline that strips the YAML fence and redirects to one catalog file — never Read-ing the transcripts. Sanity-checks with wc/grep and commits immediately so a compaction can’t lose the result.
Adversarially verify
Runs a fresh verifier agent that didn’t produce the data: spot-checks a 10–20% sample against primary sources, hunts for fabricated URLs/dates/IDs, checks cross-agent seams for duplicates, and tests the volume claim and gaps. Dedup, normalization, and URL verification follow as separate passes.
Notes
-
Key anti-patterns: re-outputting the whole catalog via
Write, reading agent JSONL directly, reusing agent IDs across reruns, skipping the volume target, and skipping the verification half. -
Agents without web access fabricate plausible-looking URLs and facts — verification is mandatory, not optional.