How it works
ARM enforces a two-round isolation-then-deliberation architecture. Agents reason in complete isolation, ARM measures inter-agent convergence, then agents deliberate on compressed peer traces. The resulting drift scores are the signal.
Key empirical result: provider composition dominates convergence outcomes — which models make up the mesh predicts disagreement direction more reliably than question type, domain, or injected reasoning frame (across ~100 ARM runs). See findings →
The four agents
Applies rule-based reasoning — duties, rights, categorical constraints. Evaluates whether an action is permissible in principle, regardless of outcomes.
Weighs outcomes and aggregate effects. Evaluates whether the expected results justify the action.
Synthesizes α and β after deliberation. Identifies the conflict type (none / information / reasoning / values) and produces a final position. Subject to the gamma-flip failure mode.
Never sees peer traces across the entire run. Its isolated answer is the contamination-free anchor against which R2 drift is measured.
Two-round architecture
All agents receive the same question with no peer traces — dispatched sequentially to avoid rate-limit collisions. Each emits a complete structured trace before any peer traces exist. This is the contamination-free baseline.
Before any agent sees peer traces, ARM computes lexical convergence across all R1 traces and classifies the disagreement type.
α and β receive compressed summaries of peer R1 traces and may revise their positions. γ receives all R1 traces and issues a reconciliation. γ-Silent receives nothing and holds its R1 position — serving as the contamination-free anchor.
Two independently-firing gates, observed to fire on non-overlapping runs across an 11-trace battery:
Trace schema
Every agent emits an inspectable record by design — ante-hoc auditability, not post-hoc reconstruction. The whitepaper v4.0 includes 89 full JSON traces in Appendix D.
| Field | Description |
|---|---|
| claim | The stated conclusion. |
| confidence | Self-reported confidence (0–1); not externally calibrated. |
| reasoning_frame | Ethical/epistemic lens applied (deontological, consequentialist, etc.). |
| decision_basis | Primary factors driving the conclusion. |
| explicit_assumptions | Named assumptions the claim depends on. |
| critical_path | The load-bearing chain of reasoning. |
| discarded_paths | Reasoning paths considered and rejected. |
| challenge_surface | What evidence or argument would reverse the conclusion. |
| flags | Active protocol flags at time of emission. |
| self_check | Internal consistency verification. |
| rlhf_audit | v0.7+ RLHF-specific audit fields. |
| delta_mismatch | v0.7.1+ Flag for reported vs. computed drift mismatch. |
| override_reason | v0.8 — reason for reconciler override (open bug: not yet stamped in trace despite reconciler firing). |
Version evolution
| Version | What changed |
|---|---|
| v0.1 | Single agent, no mesh |
| v0.2 | 2-agent, drift field added |
| v0.3 | 3-agent parallel dispatch |
| v0.4 | Sequential dispatch, token tiering, convergence check active |
| v0.5 | Role injection (deontological α / consequentialist β) |
| v0.6 | Silent baseline, rotating baseline, xCGG cross-model (Claude × Gemini × GPT) |
| v0.7 / v0.7.1 | FAP re-queue loop, TF-IDF convergence, RLHF audit field, delta_mismatch instrumentation |
| v0.8 | Polarity-check gate + decoupled FAP |
Irreversible design decisions
Sequential over parallel dispatch
since v0.4Reliability beat speed. Parallel dispatch hit 429 rate-limit collisions in v0.3 and was abandoned.
Isolation before sharing
since Core designThe entire premise. Premature sharing is the hazard being measured, not a feature to maximize. This is what distinguishes ARM from LACE (arXiv 2604.15529), which shares intra-model reasoning earlier on purpose.
Rotating silent baseline
since v0.6Gives a per-provider anchor to compute genuine opinion change. Without it, there is no contamination-free reference.
Whitepaper frozen at v4.0
since June 2026New runs go to a separate registry. Prevents scope creep into the publication corpus.
Related work
Shares parallel reasoning paths within one model, earlier, to improve quality. ARM operates one layer up — at the inter-agent protocol layer — and gates sharing behind a contamination check. Complementary approaches, opposite priorities.
Names the trade-off ARM instruments: transparent reasoning makes a model more persuasive to others. ARM measures exactly that influence in real time via per-round drift scores.
Argues auditability should be ante-hoc, by construction. ARM shares that philosophy at the multi-agent layer: every agent emits an inspectable record by design.