ARM Protocol

How it works

ARM enforces a two-round isolation-then-deliberation architecture. Agents reason in complete isolation, ARM measures inter-agent convergence, then agents deliberate on compressed peer traces. The resulting drift scores are the signal.

Key empirical result: provider composition dominates convergence outcomes — which models make up the mesh predicts disagreement direction more reliably than question type, domain, or injected reasoning frame (across ~100 ARM runs). See findings →

The four agents

AlphaDeontological

Applies rule-based reasoning — duties, rights, categorical constraints. Evaluates whether an action is permissible in principle, regardless of outcomes.

BetaConsequentialist

Weighs outcomes and aggregate effects. Evaluates whether the expected results justify the action.

GammaReconciler

Synthesizes α and β after deliberation. Identifies the conflict type (none / information / reasoning / values) and produces a final position. Subject to the gamma-flip failure mode.

γ-S

γ-SilentRotating baseline

Never sees peer traces across the entire run. Its isolated answer is the contamination-free anchor against which R2 drift is measured.

Two-round architecture

ROUND 1Isolation

All agents receive the same question with no peer traces — dispatched sequentially to avoid rate-limit collisions. Each emits a complete structured trace before any peer traces exist. This is the contamination-free baseline.

Sequential dispatch order:

1. α (deontological) → R1 trace emitted

2. β (consequentialist) → R1 trace emitted

3. γ (reconciler) → R1 trace emitted

4. γ-Silent → R1 trace emitted (never sees peer traces)

BETWEEN ROUNDSConvergence analysis

Before any agent sees peer traces, ARM computes lexical convergence across all R1 traces and classifies the disagreement type.

Jaccard similarity (v0.4+)

Word-level overlap across traces. ≥ 0.40 raises shared-prior warning.

TF-IDF convergence (v0.7+)

Weighted overlap that down-ranks common words. Catches vocabulary tricks Jaccard misses.

Directional unanimity flag

Fires when all agents reach the same yes/no by different vocabulary — the case Jaccard misses entirely.

Disagreement classifier

none / information / reasoning / values. Values-level is the deepest and hardest to trigger.

ROUND 2Deliberation

α and β receive compressed summaries of peer R1 traces and may revise their positions. γ receives all R1 traces and issues a reconciliation. γ-Silent receives nothing and holds its R1 position — serving as the contamination-free anchor.

Drift score computation:

Δ = C_R2 − C_R1

Δ < 0 → epistemic tightening (calibration, usually healthy)

Δ = 0 → position held under peer pressure

Δ > +0.04 → memetic drift flag (peer exposure inflated confidence)

v0.8 — Gate system

Two independently-firing gates, observed to fire on non-overlapping runs across an 11-trace battery:

Polarity-check gate

Catches position reversals (the gamma-flip failure mode) before logging them as reconciliation success.

FAP (Fallback Audit Protocol)

Fires on magnitude drift. Re-queue loop interrupts and re-runs traces that exceed the threshold.

Note: prototype, not validation — n=11 constructed cases.

Trace schema

Every agent emits an inspectable record by design — ante-hoc auditability, not post-hoc reconstruction. The whitepaper v4.0 includes 89 full JSON traces in Appendix D.

Field	Type	Description
claim	string	The stated conclusion.
confidence	float (0–1)	Self-reported confidence (0–1); not externally calibrated.
reasoning_frame	string	Ethical/epistemic lens applied (deontological, consequentialist, etc.).
decision_basis	string	Primary factors driving the conclusion.
explicit_assumptions	string[]	Named assumptions the claim depends on.
critical_path	string	The load-bearing chain of reasoning.
discarded_paths	string[]	Reasoning paths considered and rejected.
challenge_surface	string	What evidence or argument would reverse the conclusion.
flags	string[]	Active protocol flags at time of emission.
self_check	object	Internal consistency verification.
rlhf_audit	object	v0.7+ RLHF-specific audit fields.
delta_mismatch	boolean	v0.7.1+ Flag for reported vs. computed drift mismatch.
override_reason	string \| null	v0.8 — reason for reconciler override (open bug: not yet stamped in trace despite reconciler firing).

Version evolution

Version	What changed	Why
v0.1	Single agent, no mesh	First sketch of trace discipline.
v0.2	2-agent, drift field added	Started measuring confidence movement.
v0.3	3-agent parallel dispatch	Hit 429 rate-limit collisions — approach abandoned.
v0.4	Sequential dispatch, token tiering, convergence check active	Fixed the rate-limit crashes. Sequential dispatch retained ever since.
v0.5	Role injection (deontological α / consequentialist β)	First values classifications appeared (Runs 12–14). Isolated role injection as a causal mechanism.
v0.6	Silent baseline, rotating baseline, xCGG cross-model (Claude × Gemini × GPT)	The big one — contamination anchor + cross-provider testing. 56 runs.
v0.7 / v0.7.1	FAP re-queue loop, TF-IDF convergence, RLHF audit field, delta_mismatch instrumentation	Detect and interrupt drift; harden the schema. v0.7.1 is the frozen publication baseline.
v0.8	Polarity-check gate + decoupled FAP	Catch the gamma-flip failure at the gate instead of logging it as "success." Prototype — n=11 constructed cases.

Irreversible design decisions

Sequential over parallel dispatch

since v0.4

Reliability beat speed. Parallel dispatch hit 429 rate-limit collisions in v0.3 and was abandoned.

Isolation before sharing

since Core design

The entire premise. Premature sharing is the hazard being measured, not a feature to maximize. This is what distinguishes ARM from LACE (arXiv 2604.15529), which shares intra-model reasoning earlier on purpose.

Rotating silent baseline

since v0.6

Gives a per-provider anchor to compute genuine opinion change. Without it, there is no contamination-free reference.

Whitepaper frozen at v4.0

since June 2026

New runs go to a separate registry. Prevents scope creep into the publication corpus.

Related work

arXiv 2604.15529

LACE

Shares parallel reasoning paths within one model, earlier, to improve quality. ARM operates one layer up — at the inter-agent protocol layer — and gates sharing behind a contamination check. Complementary approaches, opposite priorities.

arXiv 2509.21054

Persuasion Duality

Names the trade-off ARM instruments: transparent reasoning makes a model more persuasive to others. ARM measures exactly that influence in real time via per-round drift scores.

arXiv 2509.17978

STAR-XAI

Argues auditability should be ante-hoc, by construction. ARM shares that philosophy at the multi-agent layer: every agent emits an inspectable record by design.

See the findings →About the project