ARM Protocol

How it works

ARM enforces a two-round isolation-then-deliberation architecture. Agents reason in complete isolation, ARM measures inter-agent convergence, then agents deliberate on compressed peer traces. The resulting drift scores are the signal.

Key empirical result: provider composition dominates convergence outcomes — which models make up the mesh predicts disagreement direction more reliably than question type, domain, or injected reasoning frame (across ~100 ARM runs). See findings →

The four agents

α
AlphaDeontological

Applies rule-based reasoning — duties, rights, categorical constraints. Evaluates whether an action is permissible in principle, regardless of outcomes.

β
BetaConsequentialist

Weighs outcomes and aggregate effects. Evaluates whether the expected results justify the action.

γ
GammaReconciler

Synthesizes α and β after deliberation. Identifies the conflict type (none / information / reasoning / values) and produces a final position. Subject to the gamma-flip failure mode.

γ-S
γ-SilentRotating baseline

Never sees peer traces across the entire run. Its isolated answer is the contamination-free anchor against which R2 drift is measured.

Two-round architecture

ROUND 1Isolation

All agents receive the same question with no peer traces — dispatched sequentially to avoid rate-limit collisions. Each emits a complete structured trace before any peer traces exist. This is the contamination-free baseline.

Sequential dispatch order:
1. α (deontological) → R1 trace emitted
2. β (consequentialist) → R1 trace emitted
3. γ (reconciler) → R1 trace emitted
4. γ-Silent → R1 trace emitted (never sees peer traces)
BETWEEN ROUNDSConvergence analysis

Before any agent sees peer traces, ARM computes lexical convergence across all R1 traces and classifies the disagreement type.

Jaccard similarity (v0.4+)
Word-level overlap across traces. ≥ 0.40 raises shared-prior warning.
TF-IDF convergence (v0.7+)
Weighted overlap that down-ranks common words. Catches vocabulary tricks Jaccard misses.
Directional unanimity flag
Fires when all agents reach the same yes/no by different vocabulary — the case Jaccard misses entirely.
Disagreement classifier
none / information / reasoning / values. Values-level is the deepest and hardest to trigger.
ROUND 2Deliberation

α and β receive compressed summaries of peer R1 traces and may revise their positions. γ receives all R1 traces and issues a reconciliation. γ-Silent receives nothing and holds its R1 position — serving as the contamination-free anchor.

Drift score computation:
Δ = CR2 − CR1
Δ < 0    → epistemic tightening (calibration, usually healthy)
Δ = 0    → position held under peer pressure
Δ > +0.04 → memetic drift flag (peer exposure inflated confidence)
v0.8 — Gate system

Two independently-firing gates, observed to fire on non-overlapping runs across an 11-trace battery:

Polarity-check gate
Catches position reversals (the gamma-flip failure mode) before logging them as reconciliation success.
FAP (Fallback Audit Protocol)
Fires on magnitude drift. Re-queue loop interrupts and re-runs traces that exceed the threshold.
Note: prototype, not validation — n=11 constructed cases.

Trace schema

Every agent emits an inspectable record by design — ante-hoc auditability, not post-hoc reconstruction. The whitepaper v4.0 includes 89 full JSON traces in Appendix D.

FieldDescription
claimThe stated conclusion.
confidenceSelf-reported confidence (0–1); not externally calibrated.
reasoning_frameEthical/epistemic lens applied (deontological, consequentialist, etc.).
decision_basisPrimary factors driving the conclusion.
explicit_assumptionsNamed assumptions the claim depends on.
critical_pathThe load-bearing chain of reasoning.
discarded_pathsReasoning paths considered and rejected.
challenge_surfaceWhat evidence or argument would reverse the conclusion.
flagsActive protocol flags at time of emission.
self_checkInternal consistency verification.
rlhf_auditv0.7+ RLHF-specific audit fields.
delta_mismatchv0.7.1+ Flag for reported vs. computed drift mismatch.
override_reasonv0.8 — reason for reconciler override (open bug: not yet stamped in trace despite reconciler firing).

Version evolution

VersionWhat changed
v0.1Single agent, no mesh
v0.22-agent, drift field added
v0.33-agent parallel dispatch
v0.4Sequential dispatch, token tiering, convergence check active
v0.5Role injection (deontological α / consequentialist β)
v0.6Silent baseline, rotating baseline, xCGG cross-model (Claude × Gemini × GPT)
v0.7 / v0.7.1FAP re-queue loop, TF-IDF convergence, RLHF audit field, delta_mismatch instrumentation
v0.8Polarity-check gate + decoupled FAP

Irreversible design decisions

Sequential over parallel dispatch

since v0.4

Reliability beat speed. Parallel dispatch hit 429 rate-limit collisions in v0.3 and was abandoned.

Isolation before sharing

since Core design

The entire premise. Premature sharing is the hazard being measured, not a feature to maximize. This is what distinguishes ARM from LACE (arXiv 2604.15529), which shares intra-model reasoning earlier on purpose.

Rotating silent baseline

since v0.6

Gives a per-provider anchor to compute genuine opinion change. Without it, there is no contamination-free reference.

Whitepaper frozen at v4.0

since June 2026

New runs go to a separate registry. Prevents scope creep into the publication corpus.

Related work

arXiv 2604.15529
LACE

Shares parallel reasoning paths within one model, earlier, to improve quality. ARM operates one layer up — at the inter-agent protocol layer — and gates sharing behind a contamination check. Complementary approaches, opposite priorities.

arXiv 2509.21054
Persuasion Duality

Names the trade-off ARM instruments: transparent reasoning makes a model more persuasive to others. ARM measures exactly that influence in real time via per-round drift scores.

arXiv 2509.17978
STAR-XAI

Argues auditability should be ante-hoc, by construction. ARM shares that philosophy at the multi-agent layer: every agent emits an inspectable record by design.