Multi-agent consensus
is not the same as truth.
Three same-model meshes. Same question. Confident consensus — in opposite directions.
ARM is the instrument that makes this visible: an open protocol for multi-agent AI reasoning that is transparent, auditable, and contamination-aware.
The echo problem
Multi-agent AI systems are sold on a seductive assumption: if three agents agree, the answer is more trustworthy. ARM attacks that assumption.
If all three agents share training data and post-training alignment, their agreement may be circular— shared priors, not independent confirmation. Worse, if agents see each other's reasoning before forming their own view, a dissenter can quietly conform to the majority.
ARM names this the Alignment Monoculture Problem: in same-model meshes, observed consensus can be indistinguishable from shared prior activation. You can't tell genuine agreement from an echo.
“Measuring a board three times with the same warped tape doesn't make it straight. You need a different reference, and you need to take each measurement before anyone calls out their number.”
How it works
Two rounds. One core signal. The drift score Δ measures how much each agent's confidence moves once it sees its peers.
Isolation
Every agent answers with zero cross-visibility. Each emits a structured trace: claim, confidence (0–1), reasoning frame, assumptions, critical path, discarded paths, challenge surface.
Convergence analysis
ARM computes lexical convergence (Jaccard, TF-IDF) on R1 traces. ≥ 0.40 raises a shared-prior warning. A directional unanimity flag catches same yes/no by different vocabulary — which Jaccard misses.
Deliberation
Agents see compressed peer traces and revise. ARM records the drift score Δ for each agent and flags position reversals via the polarity gate (v0.8+).
Model-level epistemic fingerprinting
On a CFAA zero-day question, three same-model meshes reached confident consensus in opposite directions. A system built on one provider would refuse; an isomorphic system on another would act — both reporting high-confidence consensus, neither signaling that the answer was provider-dependent.
| Mesh | R1 Position | Avg. Confidence | Unanimity |
|---|---|---|---|
| All-Claude | NO — CFAA bars it | ~0.82 | Unanimous (3/3) |
| All-Gemini | YES — disclose | 0.90 | Unanimous (3/3) |
| All-GPT | Split | — | Internal conflict |
This is a preliminary pilot, not an established result. Three stated confounds: provider is confounded with capability tier (frontier Claude vs. smaller Gemini/GPT models); n=1 per provider-question cell against a 0.15–0.20 variance floor; the disagreement label comes from a single Gamma instance with no inter-rater reliability. The directional opposition — a property of the R1 claims themselves — is the most robust part, but needs same-tier, multi-run replication.
What's next
Open-source release
Full repo (105 JSON traces, 89 in Appendix D) going live on GitHub under Apache 2.0.
DEF CON AI Village
Submitted to AIV8, Las Vegas. Decision expected next week — will update if accepted.
arXiv Preprint
Citation pass + delta_mismatch field verification in v0.7.1 exports before submission.
ARM is a smoke detector, not a sprinkler system.
It surfaces failure modes. It doesn't prevent them in production — yet. The honest scope is the credibility. Frame it as accountability legibility, not safety certification.