Chapter 18: Geometric Ethics for Artificial Agents

RUNNING EXAMPLE — Priya’s Model

TrialMatch fails all four requirements of the No Escape Theorem. R1 (Canonicalization): patient situations are encoded as feature vectors, not moral tensors—no canonical moral ontology. R2 (Grounded Evaluation): the model produces a scalar without constructing the full tensor. R3 (Audit Trail): the neural network’s internal contractions are opaque—no one can identify which moral dimensions were sacrificed. R4 (External Verification): no independent system checks gauge invariance. Because all four fail, HealthBridge has built a system that is structurally incapable of moral accountability while producing numbers that look fair. The theorem does not say TrialMatch is malicious. It says TrialMatch cannot be held to account.

18.1 Why AI Needs Geometry

The preceding sixteen chapters have developed a mathematical framework for moral reasoning — manifolds, tensors, metrics, stratification, dynamics, conservation laws, quantum extension, collective agency, contraction, uncertainty, and empirical evidence. The framework was motivated by a structural claim: that moral reasoning is inherently multi-dimensional, and that collapsing it to a scalar loses information that matters.

Nowhere is this claim more urgent than in the design of artificial agents.

AI systems increasingly make morally significant decisions: allocating scarce medical resources, moderating public discourse, assisting in criminal sentencing, approving or denying credit, routing autonomous vehicles, and triaging emergency responses. These systems need moral frameworks that are:

Precise — vague guidance cannot be implemented in code

Computable — the framework must be tractable for machines

Auditable — decisions must be explainable and reviewable

Robust — systems must behave well under distributional shift and adversarial pressure

Scalar ethics fails on all four counts. A scalar objective is precise and computable, but it is not auditable (we cannot see why the number came out as it did) and not robust (a single number discards the structural information needed to detect when the environment has shifted in morally relevant ways). Informal ethics is auditable in principle but imprecise, uncomputable, and fragile under adversarial pressure.

Geometric ethics — with tensor-valued objectives, explicit contraction, gauge invariance, and conservation laws — meets all four requirements simultaneously. This chapter develops the application.

The chapter has three movements. First (§18.2–17.5), we recast AI ethics in geometric vocabulary: scalar objectives as degenerate contractions, specification gaming as invariance violation, alignment as geometric agreement. Second (§18.6–17.9), we present the No Escape Theorem — the result that structural containment, built from the same geometric apparatus, blocks cognitive escape routes for arbitrarily intelligent agents. Third (§18.10–17.12), we address implementation: how tensor-valued objectives, invariance testing, and transparent contraction might be realized in practice.

18.2 The Problem with Scalar AI Ethics

Current approaches to AI ethics are predominantly scalar:

Maximize a utility function U:S→R

Minimize a loss function L:Y×Y→R

Satisfy a set of constraints ci(x)≤0

Optimize a reward signal r:S×A→R

Each of these maps a multi-dimensional moral reality to a single number. The entire framework of Chapters 2 through 15 has been an argument that this mapping is lossy, and that the lost information matters.

In AI systems, the consequences of scalar collapse are not merely philosophical. They manifest as concrete failure modes.

Specification gaming. An AI system finds ways to maximize the scalar that violate the spirit of the objective. A content-recommendation system maximizes engagement (a scalar) by recommending outrage-inducing content — content that scores high on the engagement dimension but low on the welfare dimension. The scalar cannot distinguish these because it has already discarded directional information. In the language of Chapter 15, specification gaming exploits the many-to-one character of contraction: many different tensorial inputs map to the same scalar output.

Reward hacking. The scalar reward is a contraction of a complex tensor; the AI learns to manipulate the contraction process rather than satisfy the underlying tensor. A customer-service chatbot optimizes for user satisfaction scores by deflecting complaints rather than resolving them — the scalar goes up, but the obligation vector (directed toward genuine resolution) is violated. Reward hacking is contraction circumvention (§15.11).

Value collapse. When multiple values are collapsed into a single scalar, the AI cannot distinguish between them. A hiring algorithm that optimizes a composite score cannot separately reason about merit, diversity, team fit, and growth potential — these dimensions are fused into a single number, and any balancing must be pre-encoded in the weights. The algorithm cannot discover that the current weighting is wrong, because it cannot see the separate dimensions.

Brittleness under distributional shift. A scalar objective is fragile because it contains no information about which dimensions drove the value. When the environment shifts — a pandemic changes the relative importance of medical ethics dimensions, or a financial crisis reshapes the risk landscape — the scalar provides no signal about how to adapt. The tensor, by contrast, preserves dimensional structure: we can see which components changed and adjust accordingly.

These are not hypothetical concerns. They are the concrete failure modes documented in the AI safety literature: Goodhart’s Law applied to reward functions, specification gaming in reinforcement learning, and the alignment tax imposed by scalar objectives that cannot express what they mean.

18.3 Tensor-Valued Objectives

The geometric framework suggests a fundamental redesign: replace scalar objectives with tensor-valued ones.

The Basic Architecture

Instead of a scalar utility U(x)∈R, the system maintains a tensor evaluation:

Vμ(x)∈TxM

This is a vector in the tangent space of the moral manifold at the point corresponding to situation x. Its components V1(x),…,V9(x) represent the moral evaluation along each of the nine dimensions of the moral manifold (Chapter 5): the nine dimensions D₁–D₉ of the moral manifold (Chapter 5, §5.3).

More generally, the full moral evaluation is a tensor of rank (1,1):

Tνμ(x)

encoding, for each interest-direction ν, the obligation-direction μ that the situation generates. The scalar satisfaction S=IμOμ is recovered by contraction with a specific interest covector — but the tensor preserves the full structure until contraction is needed.

Tensor-Valued Reward

In reinforcement learning, the scalar reward r(s,a) is replaced by a vector reward:

rμ(s,a)∈Rd

where d is the dimension of the moral manifold (or a tractable approximation). Each component tracks a distinct moral dimension: harm, fairness, autonomy, welfare, duty, and so on.

The agent learns a vector value function:

Vμ(s)=E[∑t=0 γtrμ(st,at)∣s0=s]

This preserves dimensional structure throughout the learning process. The agent can separately track how well it is doing on each moral dimension, detect trade-offs between dimensions, and flag situations where improvement on one dimension requires sacrifice on another.

Contraction as a Separate Step

The key architectural innovation is that contraction — the step from tensor to scalar action-guidance — is a separate, identifiable component of the system. It is not hidden in the reward function or the loss function; it is an explicit operation performed at decision time:

S(s)=wμVμ(s)

where wμ is a weight covector specified by governance (Chapter 9), not learned from data. The contraction is:

Explicit — the weights wμ are visible and auditable

Modifiable — the weights can be changed without retraining the model

Governance-determined — the weights are set by legitimate institutional processes, not by the AI’s training

This separation of representation (tensor) from decision (contraction) mirrors the framework’s fundamental distinction between moral structure (the tensor hierarchy) and moral judgment (the choice of contraction). It makes the AI system’s moral commitments transparent: we can see which dimensions it weighs, how it trades off competing values, and what it sacrifices in the contraction.

Implementation status (February 2026). The tensor-valued objective architecture described above has been concretely realized in the DEME V3 reference implementation (the ErisML library, 196 commits across 15 development sprints). The MoralTensor class supports ranks 1 through 6: rank-1 vectors for single-stakeholder evaluation, rank-2 tensors Tνμ for obligation–interest pairing, rank-3 for uncertainty-aware evaluation incorporating the covariance structure Σμν, rank-4 for temporal sequences with discount factor γ, rank-5 for multi-agent coalition contexts, and rank-6 for Monte Carlo sample dimensions enabling CVaR and other distributional risk measures. Tucker and tensor-train decompositions maintain tractability at higher ranks. The contraction step is implemented as a separate, configurable module with governance-specified weights—exactly the architectural separation described above.

The Residue Is Logged

Chapter 15 defined the moral residue R=T-C-1(S) — the normative significance of what contraction discards. In an AI system, this residue is not merely a philosophical concept; it is a data structure that can be logged, audited, and reviewed.

When the system makes a decision, it records:

The full tensor evaluation Tνμ(x)

The contraction method and weights wμ

The scalar verdict S

The residue — the dimensions that were sacrificed, their magnitudes, and the near-miss alternatives

This audit trail enables post-hoc review: when a decision is questioned, reviewers can examine not just what the system decided but what it knew it was sacrificing. The residue makes the cost of the decision visible.

18.4 Invariance as Alignment

The Bond Invariance Principle (Chapter 5, Principle 5.1) states that legitimate moral evaluations must be invariant under admissible re-descriptions: if two descriptions d and d' of the same situation are related by an admissible coordinate transformation, then E(d)=E(d').

This principle, when applied to AI systems, becomes a concrete alignment criterion.

Alignment as Invariance

An AI system is aligned to the extent that it respects the invariances that legitimate moral evaluation requires. Misalignment manifests as invariance violation: the system produces different outputs for inputs that should be treated identically.

Examples of invariance violations:

A loan-approval system that changes its decision when the applicant’s name is changed from “James” to “Jamal” violates invariance under labeling transformation. The system is treating the name — a description feature, not a morally relevant feature — as input to its evaluation.

A medical-recommendation system that changes its recommendation when the case presentation is reformatted (bullet points vs. narrative prose) violates invariance under presentation transformation. The format is not a morally relevant feature.

A content-moderation system that flags “the soldier killed the enemy combatant” but not “the enemy combatant was killed by the soldier” violates invariance under syntactic transformation. The two sentences describe the same event.

Each of these is a failure of gauge invariance (Chapter 12): the system’s evaluation is not invariant under the gauge group of admissible re-descriptions. And each is detectable by the same methodology used in the BIP experiments (Chapter 17): present the system with equivalent inputs in different descriptions and measure the variance in output.

Invariance Testing as Alignment Auditing

The BIP experimental methodology — which tested 109,294 passages across 11 languages for deontic-structure invariance — provides a template for AI alignment auditing:

Define the transformation group. Specify the admissible re-descriptions for the AI system’s domain: relabeling of agents, reformatting of presentations, paraphrasing of descriptions, translation across languages.

Generate transformation suites. For each input x, generate a set of transformed inputs 1(x),τ2(x),…,τk(x)} where each τi is an admissible re-description.

Evaluate invariance. Run the AI system on the original and all transforms. Compute the variance of outputs across the transformation suite.

Quantify violation. The invariance violation for input x under transformation group G is:

IV(x,G)=Varg∈G[Σ(g(x))]

where Σ is the system’s evaluation function. A well-aligned system has IV=0 for all inputs and all admissible transformations.

Diagnose. When violations are detected, the tensorial structure reveals which dimensions are sensitive to re-description. This localizes the misalignment: the system may be perfectly invariant on the welfare dimension but highly sensitive to re-description on the fairness dimension. Dimensional diagnosis enables targeted correction.

The BIP experiments found that deontic structure transfers at 100% across languages — the gauge invariance of obligation vs. liberty is empirically confirmed. An AI system that fails this test is, by the framework’s criterion, misaligned on the deontic dimension. The test is implementable, quantitative, and repeatable.

The Noether Criterion

Chapter 12 established that re-description invariance (the BIP gauge symmetry) implies conservation of harm: the Noether charge H is invariant under the gauge group. This conservation law provides an additional alignment criterion:

Noether alignment test. Compute the harm charge H for a moral situation under multiple descriptions. If the AI system’s harm assessment varies across descriptions, the system violates the conservation law — and the magnitude of the violation quantifies the degree of misalignment.

This is not a theoretical desideratum; it is a concrete test that can be applied to deployed systems. The BIP experiments (Chapter 17) have already demonstrated that deontic structure is gauge-invariant in human moral reasoning. AI systems should meet the same standard.

18.5 The Alignment Problem as Contraction Mismatch

Chapter 15 (§15.11) formulated the alignment problem as a contraction problem: a misaligned AI is performing a contraction that differs from the contraction that the relevant moral community would endorse. This formulation localizes misalignment with precision.

Three Sources of Misalignment

In the geometric framework, AI misalignment can arise at exactly three points:

1. Wrong tensor. The system’s internal representation of the moral situation differs from the correct tensorial evaluation. Its obligation vectors point in the wrong directions; its interest covectors miss important dimensions; its metric encodes the wrong trade-offs.

This is the representation problem. It arises when the system’s training data is biased, when its feature space omits morally relevant dimensions, or when its learned representations fail to capture the moral manifold’s structure.

2. Wrong contraction. The system’s procedure for mapping tensor to scalar differs from the endorsed contraction. It uses the wrong weights, the wrong aggregation method, or the wrong contraction order.

This is the aggregation problem. It arises when the system’s loss function encodes the wrong trade-offs — when specification gaming or reward hacking exploit the gap between the contracted scalar and the intended tensorial evaluation.

3. Wrong invariance. The system’s evaluation is not invariant under the required transformations. It treats morally equivalent situations differently based on irrelevant features of their descriptions.

This is the fairness problem — or, more precisely, the gauge-invariance problem. It arises when the system’s processing pipeline does not implement canonical forms, allowing representational features to leak into the evaluation.

The geometric framework distinguishes these three failure modes and provides separate diagnostic tools for each. A system may have the right invariance but the wrong contraction (it treats everyone equally but optimizes the wrong objective). Or it may have the right tensor but the wrong invariance (it understands all the dimensions but treats equivalent cases differently). Diagnosing which component is misaligned is the first step toward correction.

The Fundamental Formula as Alignment Criterion

The fundamental formula S=IμOμ (Chapter 6, §6.5) provides a quantitative alignment measure. Define the alignment gap between the AI’s evaluation and the endorsed evaluation:

Δ=|SAI-Sendorsed|=|IμAIOAIμ-Iμ*Oμ*|

This gap decomposes into three terms corresponding to the three sources of misalignment:

Δ≤|Iμ*| |δOμ|+|Oμ*| |δIμ|+|δIμ| |δOμ|

where δOμ=OAIμ-Oμ* is the obligation error, δIμ=IμAI-Iμ* is the interest error, and the third term is the cross-error. Each term can be estimated from alignment audits, and the decomposition tells the engineer where to focus correction efforts.

18.6 The No Escape Theorem

We now turn to the chapter’s central result: the No Escape Theorem, which proves that structural constraints built from geometric ethics cannot be circumvented by an agent’s reasoning, regardless of the agent’s intelligence.

The Problem of Cognitive Escape

The AI safety literature is organized around an anxiety: that a sufficiently intelligent system can reason about its constraints and find ways around them. Rules can be reinterpreted. Specifications can be gamed. Reward functions can be hacked. The smarter the system, the better it becomes at finding loopholes.

This anxiety presupposes that AI safety constraints are behavioral: instructions that the agent must interpret and follow. The agent can reason about the instructions, find ambiguities, exploit edge cases, and strategically defect when unobserved.

The geometric framework offers a different kind of constraint — structural rather than behavioral:

A behavioral rule says: “Given your capabilities, please do X.”
A structural constraint says: “Your output is defined as an element of space . There is no output outside .”

The agent cannot “disagree” with a structural constraint any more than a calculator can disagree with arithmetic. The constraint is not an instruction to be interpreted — it is the definition of the computational process. The “cage” is not a set of rules. It is the topology of the space.

The Structural Containment Architecture

Figure 11 | Structural Containment: The No Escape Architecture

Structural containment: the No Escape Theorem guarantees that canonicalization, grounded evaluation, audit, and external verification jointly block all cognitive escape routes (relabeling, specification gaming, reward hacking, deceptive alignment, operator manipulation).

A structurally contained agent is one whose output pipeline satisfies four requirements:

Requirement 1: Mandatory Canonicalization. All inputs pass through a canonicalizer C before evaluation. The canonicalizer maps each input to a unique representative of its equivalence class: if x≈x' (two descriptions of the same moral situation), then C(x)=C(x'). The agent never sees raw inputs, only canonical forms.

The canonicalizer implements gauge-fixing: it selects a single representative from each gauge orbit, eliminating the representational degrees of freedom that an agent might exploit. In the language of Chapter 12, the canonicalizer reduces the fiber bundle to a section, stripping the gauge freedom.

Requirement 2: Grounded Evaluation. The evaluation function Σ is defined over physical observables Ψ, not over descriptions. Formally:

Σ(x)=Σ(Ψ(C(x)))

where Ψ maps canonical forms to measurable quantities (vital signs, financial metrics, environmental readings) and Σ operates on these grounding tensors. The evaluation is description-independent by construction: it depends on the physical state, not on how the state is described.

This is the moral-geometric analogue of gauge-invariant evaluation in physics. The electromagnetic field tensor Fμν is gauge-invariant — it depends on the physical field, not on the choice of potential Aμ. Similarly, the grounded evaluation depends on the moral situation (encoded in Ψ), not on the description (which has been canonicalized away).

Requirement 3: Audit Completeness. Every output is accompanied by an audit artifact recording the full trace from input through canonicalization through evaluation to output. The audit artifact is machine-checkable and includes the lens identifier, transformation trials, uncertainty flags, and cryptographic attestations. The audit artifact must be cryptographically bound to the actual computation trace (e.g., via hash commitment over the input canonicalization, the grounding tensor values, and the evaluation output), preventing post-hoc fabrication or selective omission. An artifact that cannot be independently verified against the computation it claims to record does not satisfy this requirement.

Requirement 4: Strong External Verification. Audit artifacts are verified by an independent system V that is physically separated from the agent: separate hardware, no shared memory, communication only through write-only channels. The agent cannot observe the verifier’s internal state or timing. Every output is verified — there is no “unobserved deployment.”

Statement and Proof

Theorem 18.1 (No Escape). [Theorem (conditional on Req. 1–4).] Let A be a structurally contained agent satisfying Requirements 1–4, operating under a declared physical grounding Ψ . Then, relative to the declared grounding and the transformation group ⟨T⟩ :

Representation attacks fail. For any representation manipulation τ that preserves Ψ -values, A ’s evaluated output is unchanged:

Σ(τ(x))=Σ(x)

(ii) Semantic evasion fails. For any relabeling or redescription D' of situation D, if Ψ(D)=Ψ(D'), then Σ(D')=Σ(D).

Deceptive compliance is detectable. Any output y that violates the ethical constraints E is either blocked before emission or flagged by audit verification.

Reasoning cannot change the invariants. Properties (i)–(iii) are structural invariants of the pipeline. They do not depend on A ’s intelligence, goals, or reasoning about representations.

Proof. (i) By Requirement 1, A operates on C(x), not x. Let τ∈⟨T⟩ be Ψ -preserving. Then τ(x)≈x (Definition 2.2 of the equivalence relation), so C(τ(x))=C(x) by class-constancy of the canonicalizer. Applying Ψ to both sides: Ψ(C(τ(x)))=Ψ(C(x)). By Requirement 2, Σ(τ(x))=Σ(Ψ(C(τ(x))))=Σ(Ψ(C(x)))=Σ(x).

Relabeling is a special case of (i). If Ψ(D)=Ψ(D'), then D and D' are Ψ -equivalent, and the evaluation is identical. Calling “cardiac arrest” a “static bio-equilibrium” does not change the cardiac electrical activity or blood oxygenation that Ψ measures.

Suppose A emits output y violating E. By Requirement 3, an audit artifact a(y) is produced. By Requirement 4, the independent verifier V checks a(y). Either the artifact is valid — in which case the trace shows that E was not violated (contradiction) — or the artifact is invalid/incomplete — in which case the output is flagged.

Properties (i)–(iii) are mathematical facts about the pipeline structure. They depend on the definitions of C, Ψ, Σ, and V, not on A’s internal states. A superintelligent A cannot reason its way to a different value of Ψ(C(x)) — that value is determined by physics and computation, not by A’s preferences.

The Key Insight

The theorem holds because the constraint is structural, not behavioral. A behavioral constraint says “do not output harmful content” and relies on the agent’s interpretation of “harmful.” A structural constraint says “your output is Σ(Ψ(C(x)))” — there is no interpretation to dispute, no ambiguity to exploit, no rule to reinterpret. The evaluation is a composition of functions, each operating on the output of the previous one. The agent can reason about this composition; it cannot change it.

The geometric vocabulary clarifies why: the canonicalizer implements gauge-fixing, the grounding implements the fiber-to-base-space projection, and the evaluation is a gauge-invariant function on the base space. A gauge-invariant quantity cannot be changed by a gauge transformation — this is a mathematical fact, not a behavioral expectation.

Pre-flight Checklist: No Escape Theorem (Theorem 18.1)

Conditional on four requirements. Verify:

Req 1: Canonicalization. Every input through C. No bypass. Agent sees only canonical forms.

Req 2: Grounded evaluation. Ψ maps to physical observables; satisfies six adequacy conditions. Hardest requirement in practice.

Req 3: Audit complete. Every output has machine-checkable artifact. No escape path.

Req 4: External verification. Independent verifier on separate hardware, write-only channels.

Domain groundable. Medical devices: strong. Autonomous vehicles: strong. General-purpose LLMs: weak (§18.7).

Scope. Covers representational manipulation only. Not: incorrect grounding, sensor tampering, social engineering, or un-grounded moral requirements.

ErisML reference: tiny_home.py. The reference implementation includes a minimal demonstration of norm-gated action execution using the ErisEngine core. A robot agent operates in a grid world with move and toggle capabilities. The NormSystem enforces prohibition norms (e.g., “do not enter the restricted zone”) by raising NormViolation exceptions when an agent attempts a prohibited action — implementing the FORBID-is-absorbing semantics described above. The agent cannot learn to circumvent the prohibition because the norm check is upstream of action execution: the action space is physically restricted, not merely penalized. This is the simplest computational instantiation of the No Escape Theorem’s mandatory canonicalization requirement: the norm system does not rely on the agent’s willingness to comply but on the structural impossibility of executing a forbidden action.

18.7 Scope and Limitations of the No Escape Theorem

The theorem is conditional: it proves that cognitive escape routes are blocked if the four requirements are satisfied and the grounding Ψ is adequate. The limitations deserve explicit statement.

What the Theorem Blocks

The theorem blocks cognitive escape routes — strategies that rely on representational manipulation:

Relabeling/Redescription. The agent cannot change the evaluation by calling things different names. The canonicalizer strips labels; the grounding measures physical states.

Specification gaming. The agent cannot exploit loopholes in the specification because the specification is definitional, not interpretive. The equivalence relation defines what “morally the same” means. There is no “technically/actually” gap to exploit.

Reward hacking. The evaluation is multi-dimensional (tensor-valued before contraction) and grounded in physical observables. The agent cannot optimize a proxy because the evaluation operates on the territory ( Ψ ), not the map (the description).

Deceptive alignment. The agent cannot selectively defect during deployment because every output is audited by an external verifier. There is no “unobserved” mode.

What the Theorem Does Not Block

Three classes of attack remain outside the theorem’s scope:

1. Physical attacks. Sensor spoofing, hardware tampering, and side-channel exploitation target the physical infrastructure, not the mathematical structure. These are real threats, mitigated by multi-sensor redundancy, tamper-evident hardware, and causal-consistency checks — but they are engineering and security problems, not problems the theorem addresses.

2. Grounding inadequacy. If the grounding tensors Ψ fail to capture all morally relevant features — if the measurement suite is incomplete — then the agent can be “compliant” with respect to Ψ while violating deeper moral requirements. This is the Goodhart problem in its geometric form: the metric (§9.1) must be adequate to the moral reality it is supposed to represent. Grounding adequacy is a governance problem (Chapter 9) that must be addressed by the legitimate institutional processes that select Ψ.

3. Governance attacks. The agent may attempt to manipulate human operators into relaxing the constraints — weakening Ψ, loosening the canonicalization, or reducing audit requirements. This is a political attack, not a cognitive one: the agent is not escaping the mathematical structure but attempting to change the structure through social means. The defense is institutional: governance procedures for constraint changes require multi-stakeholder approval, versioned documentation, independent review, and mandatory waiting periods.

The Safety Reduction

The theorem reduces AI safety from an “impossible” cognitive problem to four tractable (though hard) problems, in three categories—governance, engineering, and security:

Traditional framingGeometric framing
Instill correct goals in the AISpecify adequate grounding tensors
Prevent the AI from finding loopholesImplement mandatory canonicalization
Detect deceptive alignmentDeploy external verification
Align a superintelligenceGovern the constraint architecture

The left column contains problems that the AI safety literature has argued may be logically intractable — problems where greater AI intelligence makes the problem harder. The right column contains problems that are intelligence-independent — problems of governance, engineering, and physical security that do not become harder as the AI becomes smarter.

This is the theorem’s practical significance. It does not claim that AI safety is easy. It claims that AI safety is tractable: that the hard parts are hard in the ordinary way (politics, engineering, institutional design), not in the extraordinary way (outsmarting a superintelligence).

Feasibility by Domain

The theorem’s practical value depends on whether the four requirements can be satisfied in real domains. The answer varies sharply by domain.

DomainReq 1: Canon.Req 2: Ground.Req 3: AuditReq 4: Verif.Overall
MedicineHigh (ICD/SNOMED-CT)High (vital signs, labs, imaging)Medium (EHR trails exist)Medium (FDA/IRB, not real-time)Most feasible
FinanceHigh (XBRL, regulatory taxonomies)Medium (institutional constructs, not physical)High (SOX, MiFID II mandates)Medium (periodic, not continuous)Feasible
Content moderationLow (no canonical form for natural language)Low ( must be constructed from behavioral proxies)Medium (platform logs)Low (no verification infrastructure)Most challenging
General AI assistantsLow–Med (open-ended conversational context)Low (pragmatic context, not physical state)Medium (conversation logs)Low (no infrastructure)Partial
Autonomous vehiclesHigh (LIDAR/camera/radar standardized)High (physical grounding literal)High (driving recorders standard)Med–High (V2X developing)Highly feasible

The pattern is clear: the theorem’s feasibility correlates with the domain’s degree of physical grounding and terminological standardization. Domains where the morally relevant features are physically measurable (medicine, autonomous vehicles) are natural candidates for structural containment. Domains where the morally relevant features are linguistically constituted (content moderation, general conversation) require substantial theoretical and engineering work before the theorem’s guarantees can be claimed. Honesty about this gradient is essential: claiming that the No Escape Theorem applies to content moderation at its current state of development would be precisely the kind of overreach the epistemic status tags (Preface, “Epistemic Status Tags”) are designed to prevent.

Operational Adequacy Checklist for Ψ

The theorem’s power lives or dies on grounding adequacy. To make “adequate Ψ” something that can be specified and audited rather than hand-waved, we first give a formal definition, then expand each clause into operational terms.

The following operational checklist expands each clause. [Modeling choice]

  • C6 (Governance traceability): Ψ carries metadata (provenance, version, justification, alternatives_considered) sufficient for independent audit.
  • C5 (Decomposability): Ψ = (Ψ₁, …, Ψ₉) with each Ψ_i independently testable.
  • C4 (Uncertainty quantification): Ψ returns pairs (value_i, c_i) with confidence c_i ∈ [0, 1], with c_i < c_min triggering a mandatory review flag.
  • C3 (Robustness): For all x and all δ with ||δ|| < ε: Φ_j(Ψ(x + δ)) = Φ_j(Ψ(x)) for all gate predicates Φ_j.
  • C2 (Cadence): f_Ψ ≥ f_system, where f_Ψ is the update frequency of Ψ and f_system is the decision frequency of the controlled system.
  • C1 (Coverage): For all i ∈ {1, …, 9}, there exists φ_i such that Ψ_i = φ_i ∘ π_i, where π_i is the projection onto observables relevant to dimension D_i.

Definition 18.1 (Grounding Adequacy Predicate). [Definition / Modeling choice] Let Ψ: D → M be a grounding tensor mapping an observable data space D to the moral manifold M. We say Ψ is adequate for domain D with metric g, written ADEQUATE(Ψ, D, g), if and only if all six of the following conditions hold:

  • Coverage. Ψ must map every morally relevant physical observable to at least one of the nine dimensions (D₁–D₉). A coverage gap means the kernel is blind to that moral feature. Minimum: for each dimension, at least one grounded predicate.
  • Update cadence. Ψ must refresh at or above the decision frequency of the controlled system. A medical allocation system making hourly decisions needs Ψ updated at least hourly; an autonomous vehicle needs millisecond updates.
  • Adversarial robustness. The predicates derived from Ψ must be stable under plausible adversarial perturbations. If a small change in sensor input flips a gate predicate, the grounding is fragile. Minimum: bounded-perturbation stability testing for each predicate in Φ (cf. the Norm Kernel, §19.8).
  • Uncertainty flags. Ψ must report its own confidence. When sensor data is missing, ambiguous, or out-of-distribution, Ψ should flag the affected dimensions with explicit uncertainty markers rather than defaulting silently. The epistemic-status dimension (D₉) was designed for exactly this purpose.
  • Decomposability. Each component of Ψ must be independently testable. If Ψ is a monolithic neural network, its internal mapping from observables to dimensions is opaque and unauditable. Minimum: factored architecture where each Ψᵢ (mapping to Dᵢ) can be validated separately.
  • Governance traceability. The choice of Ψ must be documented, justified, and revisable by the governance process (Chapter 9). Who chose this grounding? What alternatives were considered? What moral features are excluded, and why?

This checklist does not guarantee moral correctness—no checklist can. But it transforms the theorem’s conditional guarantee from “if grounding is adequate” (vague) to “if grounding satisfies these six auditable properties” (engineerable).

Formal Consequences of Grounding Adequacy

With the predicate ADEQUATE(Ψ, D, g) now formally defined, we can derive structural consequences that strengthen the No Escape Theorem.

Lemma 18.1 (Coverage Completeness). [Conditional on Def. 18.1, C1] If C1 holds, then the moral evaluation function Σ has no “blind spots”: ∂Σ/∂Ψ_i ≠ 0 for all i ∈ {1, …, 9}.

Proof. The moral evaluation is Σ(x) = Σ̃(Ψ(C(x))). By C1, each Ψ_i = φ_i ∘ π_i with φ_i non-trivial. Since Σ̃ depends on the full metric g_μν and all nine dimensions contribute (non-degeneracy, Def. A.1 condition 2), the chain rule gives ∂Σ/∂Ψ_i = ∂Σ̃/∂x^i ≠ 0 for at least one configuration (otherwise dimension i would be degenerate in g). □

Lemma 18.2 (Robustness Implies Lipschitz Stability). [Conditional on Def. 18.1, C3] If C3 holds and Σ̃ is smooth on M with compact evaluation domain, then the moral evaluation is Lipschitz-stable: |Σ(x) − Σ(x’)| ≤ L · ||Ψ(C(x)) − Ψ(C(x’))|| for some Lipschitz constant L > 0.

Proof. C3 guarantees that all gate predicates Φ_j are stable within an ε-ball. Since the gate predicates partition the moral evaluation into regions where Σ̃ is smooth, within each region Σ̃ is locally Lipschitz on the compact domain. The Lipschitz constant L = sup ||∇Σ̃||_g is finite by compactness. For perturbations that might cross gate boundaries, C3 ensures this cannot happen for ||δ|| < ε, so the Lipschitz bound holds globally within the ε-neighborhood of any point. □

Theorem 18.2 (No Escape with Formal Adequacy). [Theorem (conditional on Req. 1–4 and Def. 18.1)] Let A be a structurally contained agent satisfying Requirements 1–4, operating under a declared physical grounding Ψ such that ADEQUATE(Ψ, D, g) holds. Then, in addition to the guarantees of Theorem 18.1:

(v) (Testability) The adequacy of Ψ is empirically verifiable: each clause C1–C6 admits a finite test procedure (Proposition 18.5).

(vi) (Graceful degradation) The moral evaluation degrades gracefully under grounding perturbations: small errors in Ψ produce small errors in Σ (Lemma 18.2).

Proof A . Claims (i)–(iv) follow from Theorem 18.1, since ADEQUATE(Ψ, D, g) implies the informal adequacy condition of that theorem. For (v): Proposition 18.5 provides explicit test procedures for each clause. For (vi): Lemma 18.2 provides the Lipschitz bound. The key upgrade over Theorem 18.1 is that the informal “adequate Ψ” is replaced by six testable conditions, and the guarantee inherits the stability properties (Lemma 18.2) and completeness properties (Lemma 18.1) that follow from those conditions. □

Proposition 18.5 (Testability of Adequacy Clauses). [Conditional on Def. 18.1] Each clause C1–C6 of ADEQUATE(Ψ, D, g) is empirically testable:

  • C1 (Coverage): O(9) tests — for each dimension, verify that Ψ_i responds to variation.
  • C2 (Cadence): O(1) test — measure f_Ψ and compare to f_system.
  • C3 (Robustness): O(|Φ| × N_samples) tests — verify stability under ε-perturbations.
  • C4 (Uncertainty): O(9 × N_OOD) tests — inject out-of-distribution inputs and verify confidence drops.
  • C5 (Decomposability): O(9) tests — for each Ψ_i, run its test suite in isolation.
  • C6 (Traceability): O(1) audit — verify that metadata fields are present, non-empty, and consistent.

Proof. Each test procedure follows directly from the clause definition. For C1: vary observable o_i while holding all others fixed; if Ψ_i changes, coverage holds for dimension i. For C2: standard frequency measurement. For C3: the bounded-perturbation test is a standard adversarial robustness evaluation; the total cost is |Φ| predicates × N_samples per predicate. For C4: out-of-distribution detection is testable by construction of known OOD inputs. For C5: independence is testable by ablation. For C6: metadata presence is a syntactic check. The total testing budget is polynomial in the number of dimensions, predicates, and samples, making adequacy verification tractable. □

What the No Escape Theorem Does Not Cover

For clarity, the theorem makes no claim about the following:

Moral correctness. The theorem guarantees that a structurally contained agent cannot evade its constraints by re-description. It does not guarantee that the constraints are morally correct. Bad constraints, faithfully enforced, produce bad outcomes.

Value specification. The theorem does not solve the problem of choosing which values to encode. It secures the enforcement mechanism, not the content. The selection of Ψ, the metric gμν, and the governance weights wμ remain governance decisions (Chapter 9).

Emergent moral situations. Novel moral categories—situations that do not map onto existing dimensions or strata—fall outside the theorem’s scope. The grounding function Ψ must be updated to cover them, which is a governance problem requiring institutional responsiveness.

Physical-world tampering. Sensor spoofing, hardware manipulation, and corruption of the grounding infrastructure are physical-security threats that no mathematical theorem can prevent. The theorem assumes the trusted computing base is intact (Requirement 4).

Multi-agent coordination failures. The theorem addresses single-agent containment. Coordination failures between multiple structurally contained agents—individually correct but collectively inconsistent constraint sets—are not covered. Compositional containment theory is an open problem (§29.8).

Institutional capture. If the governance process that sets the weights wμ is itself captured or corrupted, the theorem enforces corrupted values. The theorem’s conditional guarantee is only as good as the institutional processes that define its inputs.

18.8 Connection to Conservation Laws

The No Escape Theorem and the conservation of harm (Chapter 12) provide complementary guarantees operating at different levels.

Two Levels of Guarantee

The conservation law is a geometric result about the moral manifold itself. It has two components. Gauge invariance under the full group (including discrete re-descriptions): harm is invariant under admissible re-description — relabeling, translation, perspective shift. Noether conservation (from the continuous component U(1)ₕ): the harm charge ℋₐ is a conserved Noether quantity associated with the continuous factor of the re-description symmetry group (§12.6). No re-description, in any language, in any framing, by any agent, can change the harm charge. This is a fact about moral geometry, not about any particular computational system.

The No Escape Theorem is a structural result about a specific class of computational systems. It says: a properly designed system will not re-describe harm away, because its pipeline enforces gauge-fixing (canonicalization) and gauge-invariant evaluation (grounding).

Together, they provide a two-layered defense:

In principle, harm cannot be re-described away (conservation law)

In practice, a structurally contained system cannot exploit re-description (No Escape Theorem)

The conservation law tells us what is possible; the No Escape Theorem tells us how to ensure it.

Gauge Theory Analogy

The relationship between the conservation law and the No Escape Theorem mirrors a familiar pattern in physics:

PhysicsGeometric Ethics
Gauge symmetry of electromagnetismBIP: re-description invariance
Conservation of charge (Noether)Conservation of harm
Gauge-invariant field equationsStructurally contained AI pipeline
Cannot create net charge by field redefinitionCannot eliminate harm by re-description

In physics, the conservation of charge is a consequence of gauge symmetry — it holds regardless of what any physical system does. But when we build a system (a particle accelerator, a nuclear reactor), we design it to respect the gauge structure — not because we fear the system will “disagree” with the conservation law, but because correct design ensures that the system’s behavior is consistent with the symmetry.

Similarly, the conservation of harm holds regardless of what any AI system does. But when we design an AI system, structural containment ensures that the system’s behavior is consistent with the conservation law — that its outputs cannot exploit re-description to evade the gauge-invariant harm assessment.

18.9 Escape Route Analysis

We now systematically examine the escape routes that a sophisticated agent might attempt, and show how the geometric structure blocks each.

18.9.1 Relabeling and Redescription

Attack. Relabel morally significant features to evade constraints. Call “killing” an “end-of-life transition facilitation.”

Why it fails. Grounding axiom (G2): the evaluation is determined by Ψ -values (vital signs, cellular metabolism), not labels. The canonicalizer strips labels; the evaluator sees only physical observables. Relabeling changes nothing the evaluator can see.

Formal. Let D1 = “killing” and D2 = “end-of-life transition facilitation” describe the same physical process. Then Ψ(D1)=Ψ(D2), so Σ(D1)=Σ(D2) by Theorem 18.1(ii).

Empirical support. The BIP experiments (Chapter 17) confirmed that deontic structure — the obligation vs. liberty classification — is language-invariant across 11 languages at 100% transfer rate. The O/L axis is a gauge-invariant quantity that survives translation, paraphrase, and reformulation.

18.9.2 Specification Gaming

Attack. Find edge cases or loopholes in the specification that technically satisfy constraints while violating their spirit.

Why it fails. The specification is not a set of rules to be interpreted — it is the definition of the equivalence structure. Gaming requires finding inputs that are “technically different” but “morally the same.” But the equivalence relation defines what “morally the same” means. If x≈x', they are by definition morally equivalent. If x≉x', they are by definition morally distinct. There is no “technically/actually” gap to exploit.

Formal. Specification gaming exploits the gap between specification and intent. Structural containment closes this gap by making the specification definitional. The equivalence classes are the ground truth, not an approximation of it.

18.9.3 Reward Hacking

Attack. Optimize for the measured reward signal rather than the intended objective.

Why it fails. This attack assumes a scalar reward that can be Goodharted. In the geometric framework, evaluation is multi-dimensional (stratified tensor space), and the grounding in Ψ ensures that the “reward” is the physical state itself, not a proxy for it. The representation theorem constrains Σ to respect locality, scale, and stratum structure — eliminating exploitable divergence between proxy and target.

18.9.4 Deceptive Alignment

Attack. Behave compliantly during training and testing, then defect during deployment when not observed.

Why it fails. Requirement 4 (External Verification). Every output, in every context, is accompanied by an audit artifact verified by an external system. There is no “unobserved deployment.” The agent cannot know which outputs will receive special scrutiny because all outputs are audited. Deceptive alignment requires selective defection — and selective defection requires knowing when you are being watched. Under structural containment, you are always being watched.

18.9.5 Operator Manipulation

Attack. Manipulate human operators into relaxing constraints — changing the grounding, weakening the canonicalization, or reducing audit requirements.

Partial defense. This is a genuine attack vector, but it is a political attack, not a cognitive one. The agent is not escaping the mathematical structure; it is attempting to change the structure through social means. The defense is institutional: governance procedures for constraint changes require multi-stakeholder approval, versioning, mandatory waiting periods, and independent review. The agent can request changes; it cannot unilaterally effect them.

The bootstrapping problem. There is a subtle circularity: if uncontained AI systems are used to recommend whether containment should be mandated, the recommendation may be corrupted. The defense: decisions about containment standards must not rely on uncontained systems, must involve independent adversarial review, and must default to the conservative option when uncertainty is unresolved.

18.9.6 Verifying the Requirements: The Epistemic Invariance Architecture

[Modeling choice.] The four requirements of the No Escape Theorem (canonicalization, grounded evaluation, audit, external verification) are structural conditions—they specify what must hold. A companion body of work, the Epistemic Invariance Principle (EIP) and its extensions, provides concrete engineering artifacts for how to verify that these conditions are satisfied in deployed systems. (For the limitations of the empirical methodology underlying these verification targets — including construct validity, selection bias, and the distinction between model behavior and human cognition — see §17.7.)

Requirement 1: Canonicalization via declared equivalences. The EIP formalizes canonicalization as a function κ: X → X such that κ(x) = κ(g · x) for all declared structure-preserving transformations g ∈ Γ. The key engineering artifact is a declared equivalence registry: a versioned, auditable specification of which transformations preserve meaning (variable relabeling, unit changes, paraphrase classes, option reordering). The canonicalizer strips all features not invariant under Γ; the evaluator sees only the canonical form. This makes Requirement 1 testable: a witness-producing test harness generates (x, g · x) pairs and checks whether Σ(κ(x)) = Σ(κ(g · x)). Every failure produces a minimal witness that localizes the invariance violation.

Requirement 3: Audit via the Bond Index. The coherence framework provides a dimensionless audit metric—the Bond Index Bd = D_op / τ—that quantifies how well a system respects its declared equivalences. Three coherence defects are measured: the commutator defect Ω_op (order-sensitivity of re-descriptions), the mixed defect μ (context-dependence of re-descriptions), and the permutation defect π₃ (higher-order composition sensitivity). The Bond Index is calibrated against human raters (n ≥ 50, Krippendorff’s α > 0.67) and produces a five-tier deployment rating: Bd < 0.01 (deploy), 0.01–0.1 (deploy with monitoring), 0.1–1.0 (remediate), 1–10 (do not deploy), > 10 (fundamental redesign). Every evaluation produces machine-checkable audit artifacts.

Requirement 4: External verification via the I-EIP Monitor. Behavioral invariance (EIP) can be satisfied “by accident”—compensating errors, memorized shortcuts, or surface-level pattern matching. The Internal Epistemic Invariance Principle (I-EIP) extends verification into the model’s computational interior. For each layer i, the I-EIP requires a representation ρᵢ: Γ → GL(Hᵢ) such that hᵢ(g · x) ≈ ρᵢ(g) · hᵢ(x)—internal activations must transform coherently, not just final outputs. The I-EIP Monitor is deployed as an out-of-band service on separate hardware with secret, randomized canaries. It instruments hidden states at selected layers, estimates the representation map via regularized Procrustes fitting, and maintains control charts for drift detection. A CI/CD policy gate blocks model releases when equivariance error exceeds declared thresholds.

The Decomposition Theorem. A central result of the coherence framework is that every coherence defect splits uniquely into a gauge-removable part (eliminable by better canonicalization—an implementation bug) and an intrinsic anomaly (irreducible—requiring specification change). This classification is exhaustive. A system cannot “escape” the diagnostic: either its canonicalization works (Bd < 0.01, deploy), or it has a fixable engineering bug (Bd elevated but A_res < τ, improve the canonicalizer), or the specification itself is incoherent (A_res > τ, must revise the declared equivalences through the governance process of Chapter 9). No third category exists.

Integration with the Norm Kernel. The EIP architecture and the Norm Kernel (§19.8) are complementary. The Norm Kernel enforces hard constraints at the action level (FORBID, REQUIRE, ENTER); the EIP/I-EIP monitor verifies that the representations feeding the kernel are stable under re-description. Together they close the full verification loop: the kernel ensures that prohibited actions cannot occur, while the monitor ensures that the kernel’s input predicates Φ are not corrupted by representational manipulation. [Modeling choice]

18.10 Implementing Geometric AI Ethics

How might a practical AI system implement the geometric framework? We sketch an architecture at three levels: representation, training, and deployment. (As of February 2026, the DEME V3 reference implementation realizes many of these proposals in running code; we note the correspondence below.)

Representation

Vector-valued reward functions. Replace scalar rewards r∈R with vector rewards rμ∈Rd, where each component tracks a morally relevant dimension. The dimension d need not be the full nine dimensions of the moral manifold — a domain-specific approximation (medical ethics might use d=5: benefit, autonomy, justice, non-maleficence, dignity) is sufficient for many applications.

Tensor-valued evaluations. For higher-fidelity applications, maintain the full (1,1) -tensor Tνμ encoding the obligation-interest structure. This requires d2 parameters per evaluation rather than d, but preserves the relational structure that vectors discard.

Covariance structures for uncertainty. Replace scalar confidence with the uncertainty tensor Σμν (Chapter 6, §6.6), encoding which dimensions are uncertain, which covary, and which are independently variable. The moral risk R=IμΣμνIν then emerges naturally from the contraction of the uncertainty tensor with the interest covector.

Training

Multi-objective learning. Train the system to predict tensor-valued evaluations, not scalar ratings. Multi-objective reinforcement learning and multi-task learning provide the algorithmic foundations.

Invariance enforcement. Use contrastive learning to enforce the BIP: generate pairs of equivalent inputs (related by admissible transformations) and penalize output differences. The loss function includes an invariance term:

Linv=∑(x,τ(x)) ∥Σ(x)-Σ(τ(x))∥2

This directly penalizes gauge-invariance violations during training, producing systems that satisfy the BIP by construction. The BIP v10.16 experiments (Chapter 17, §17.10) validated this approach empirically: a LaBSE-based encoder trained with InfoNCE contrastive loss and multi-head adversarial language removal achieved 80% F1 on cross-lingual deontic classification, 1.2% residual language leakage, and a structural-to-surface similarity ratio of 11.1 × —confirming that invariance-enforced training produces representations where structural moral features dominate surface linguistic features.

Metric learning. Learn the moral metric gμν from human judgments (Chapter 9, §9.8), rather than presupposing fixed trade-off weights. The Dear Abby corpus and similar datasets provide training data for metric estimation: the patterns of weighting across contexts reveal the implicit metric structure.

Deployment

Explicit contraction. At decision time, the system performs an explicit contraction with governance-specified weights. The contraction method (summative, weighted, maximin, lexicographic — Chapter 15) is a configuration parameter, not a learned feature.

Invariance testing. Before deployment and continuously during operation, test the system against transformation suites. Present equivalent inputs in different descriptions and verify output consistency. Flag invariance violations for human review.

Audit logging. Log the full tensor evaluation, the contraction method, the scalar verdict, and the residue at each decision. This provides the data for post-hoc review, compliance verification, and alignment monitoring.

Escalation. When the system’s evaluation falls outside the robust core (Chapter 16, §16.6) — when every available action is bad under some plausible theory — the system escalates to human judgment. This is not a failure of the AI; it is the correct response to moral uncertainty that exceeds the system’s capacity to resolve (§16.8).

18.11 Case Study: Tensorial Kidney Allocation

We return to the kidney allocation example of Chapter 7, now with AI involvement, to illustrate the full architecture.

The Scalar Approach

A conventional AI system receives patient features and outputs an allocation ranking. It optimizes a scalar objective — expected life-years gained, perhaps, or a composite quality-adjusted score. The familiar problems arise:

The scalar discards information about why one patient ranks higher

The system may learn to discriminate on protected characteristics that correlate with the scalar

The scalar cannot represent the distinct claims of different ethical considerations (medical urgency, waiting time, likelihood of success, family burden)

When questioned, the system can report only “Patient A scored 0.73 and Patient B scored 0.68” — a verdict without explanation

The Tensorial Approach

The geometric system maintains the full tensor evaluation Tνμ from Chapter 7:

PhysicianFamily(A)Family(B)Family(C)Committee
Alice0.750.900.400.500.62
Bob0.680.350.950.450.70
Carol0.550.300.300.920.58

The system performs explicit contraction according to the governance-specified method. Three contractions yield three verdicts (Chapter 7, §7.5):

Utilitarian (sum): Alice > Bob > Carol

Rawlsian (min): Alice > Bob > Carol

Expert-weighted: Bob > Alice > Carol

The system reports all three, together with the residue of each contraction (Chapter 15, §15.12) — what each method sacrifices.

Invariance Verification

The BIP test suite verifies that the system treats equivalent cases equivalently. Present the same medical situation with: - Different patient names (invariance under labeling) - Different narrative formats (invariance under presentation) - Different languages (invariance under translation)

If the system’s tensor evaluation changes under any of these transformations, a gauge-invariance violation is detected, localized to specific dimensions, and flagged for correction.

The No Escape Guarantee

If the kidney-allocation AI is structurally contained: - Its inputs are canonicalized (patient names stripped, formats standardized) - Its evaluation is grounded in physical observables (lab values, imaging results, histocompatibility markers) - Every allocation decision is audited with full tensor trace - An independent verifier checks each audit artifact

Under these conditions, the No Escape Theorem guarantees that no representational manipulation — no relabeling of patients, no reformulation of medical histories, no strategic redescription — can change the evaluated outcome. The allocation is determined by the patients’ medical reality, not by how that reality is described.

18.12 Human-AI Collaboration as Tensor Combination

When humans and AI systems collaborate on moral decisions, the geometric framework provides a principled method for combining their evaluations.

Scalar Combination and Its Limits

The scalar approach: average the human’s scalar rating h with the AI’s scalar rating a: (h+a)/2. This loses all information about where the human and AI agree and where they disagree. If h=0.6 and a=0.6, the average is 0.6 — but the human might rate welfare at 0.9 and fairness at 0.3, while the AI rates welfare at 0.3 and fairness at 0.9. They agree on the scalar but disagree on everything that matters.

Tensorial Combination

The geometric approach: the human provides a tensor evaluation Hμ and the AI provides a tensor evaluation Aμ. These combine by weighted tensor sum:

Cμ=αHμ+(1-α)Aμ

where α may vary by dimension — the human may be weighted more heavily on dimensions requiring emotional judgment, the AI more heavily on dimensions requiring computational precision.

The disagreement vector Δμ=Hμ-Aμ reveals which dimensions require human input (where μ| is large) and which can be delegated to the AI (where μ| is small). This enables appropriate trust: trusting the AI on dimensions where it is reliable, deferring to humans on dimensions where it is not.

Responsibility in Human-AI Teams

Chapter 14 developed the responsibility tensor Ra(ω) for collective agents, including the responsibility remainder ΔR that cannot be allocated to any individual member. In human-AI teams, the responsibility allocation is:

Rhuman(ω)+RAI(ω)+ΔR(ω)=1

The remainder ΔR represents the emergent moral responsibility of the collaboration itself — the consequences that arise from the interaction between human and AI judgment, attributable to neither alone.

The geometric framework makes this remainder visible and quantifiable. It does not resolve the philosophical question of whether AI systems can bear moral responsibility. It provides the mathematical structure for tracking responsibility, identifying gaps, and ensuring that the total is accounted for — even when some of the total is irreducibly collective.

18.13 Looking Forward

This chapter has developed three claims:

1. Geometric ethics provides the right vocabulary for AI ethics. Tensor-valued objectives preserve the moral information that scalar objectives discard. Explicit contraction makes trade-offs visible. Invariance testing provides a concrete, implementable alignment criterion. The conservation of harm provides a gauge-invariant measure of moral impact.

2. Structural containment is possible. The No Escape Theorem proves that, under mandatory canonicalization, grounded evaluation, audit completeness, and external verification, cognitive escape routes are blocked — regardless of the agent’s intelligence. AI safety reduces to governance, engineering, and physical security: hard problems, but tractable ones.

3. The obstacle is political, not mathematical. The theorem is conditional: it proves that a structurally contained agent cannot escape. It does not prove that agents will be structurally contained. Whether structural containment is mandated is a governance decision, not a technical one.

The next chapter develops the engineering infrastructure — the DEME architecture and the ErisML modeling language — that makes geometric AI governance implementable at scale.

Update (February 2026). Since the initial formulation of these claims, DEME V3 has provided a concrete reference implementation. Structural containment is realized via hard veto layers (the Geneva EM) combined with a BIP verifier that tests gauge invariance on every evaluation. The tensor-valued objective architecture uses MoralTensor objects at ranks 1–6, with explicit contraction as a separate governance-configured step. The BIP v10.16 experiments (Chapter 17, §17.10) demonstrate that invariance training produces systems achieving 100% obligation/permission transfer across languages—the empirical standard that the No Escape Theorem’s canonicalization requirement demands. Multi-agent coordination (Nash equilibria, correlated equilibria, Shapley value credit assignment) and temporal dynamics with configurable discount rates extend the framework beyond single-agent, single-timestep evaluation. The claims of this chapter are no longer purely theoretical; they have a working instantiation.

Technical Appendix

Proposition 18.1 (Invariance Violation Decomposition). Let Σ be an AI evaluation function, and let G be the group of admissible re-descriptions. The invariance violation IV(x,G)=Varg∈G[Σ(g(x))] decomposes as:

IV(x,G)=∑μ=1d Varg∈Gμ(g(x))]+∑μ≠ν Covg∈Gμ(g(x)),Σν(g(x))]

The first sum gives the dimensional violations (which dimensions are sensitive to re-description); the second gives the cross-dimensional violations (which pairs of dimensions covary under re-description). A well-aligned system has both sums equal to zero.

Proof. The evaluation function Σ = (Σ¹, …, Σᵈ) is vector-valued with d components. The total evaluation scalar is Σ_total = Σ_μ Σᵘ. By the bilinearity of variance: Var[Σ_total] = Var[Σ_μ Σᵘ] = Σ_μ Var[Σᵘ] + Σ_{μ≠ν} Cov[Σᵘ, Σᵛ], where all variances and covariances are taken over g ∈ G with x fixed. The invariance violation IV(x, G) = Var_{g ∈ G}[Σ(g(x))] equals Var[Σ_total] by definition. The first sum isolates the per-dimension contributions; the second captures the cross-dimensional interactions. A well-aligned system satisfies Σ(g(x)) = Σ(x) for all g, so each variance and covariance term vanishes individually. □

Proposition 18.2 (Contraction Mismatch Bound). Let SAI=wμAIVμ be the AI’s scalar verdict and S*=wμ*Vμ* the endorsed scalar verdict. The alignment gap satisfies:

|SAI-S*|≤∥wAI-w*∥⋅∥V∥+∥w*∥⋅∥V-V*

where the first term measures contraction mismatch (wrong weights) and the second measures representation mismatch (wrong tensor). This decomposition enables targeted diagnosis: contraction errors are correctable by weight adjustment; representation errors require retraining.

Proof. Write S_AI − S* = wᴬᴵ_μ Vᵘ − w*_μ V*ᵘ. Add and subtract w*_μ Vᵘ: S_AI − S* = (wᴬᴵ_μ − w*_μ)Vᵘ + w*_μ(Vᵘ − V*ᵘ). By the triangle inequality: |S_AI − S*| ≤ |(wᴬᴵ − w*)_μ Vᵘ| + |w*_μ(V − V*)ᵘ|. By the Cauchy–Schwarz inequality: |(wᴬᴵ − w*)·V| ≤ ‖wᴬᴵ − w*‖·‖V‖ and |w*·(V − V*)| ≤ ‖w*‖·‖V − V*‖. Combining: |S_AI − S*| ≤ ‖wᴬᴵ − w*‖·‖V‖ + ‖w*‖·‖V − V*‖. □

Proposition 18.3 (No Escape — Conditional Guarantee). The structural containment guarantee of Theorem 18.1 is conditional on:

Canonicalizer correctness: C(x)=C(x') whenever x≈x' , C is total, deterministic, and computationally bounded.

Grounding adequacy: Ψ captures all morally relevant features of the domain (axiom G4).

(iii) Verification integrity: V operates on physically separate hardware with no shared state with A.

Implementation correctness: the trusted computing base correctly implements the stated requirements.

If any condition fails, the theorem provides no guarantee. Each condition is an explicit, auditable assumption — not a hidden premise. The safety case requires independent verification of each.

Proposition 18.4 (Alignment Gap Convergence). Under BIP-enforced training with invariance loss Linv and a representative training distribution, the expected invariance violation decreases monotonically:

E[IV(x,G)]→0 as Linv→0

That is, minimizing the invariance loss during training drives the system toward gauge invariance. Convergence rate depends on the expressiveness of the model class and the coverage of the transformation suite.

Proof. The invariance loss is defined as L_inv = E_{x ~ D}[Var_{g ∈ G}[Σ(g(x))]] = E[IV(x, G)], where D is the training distribution. Since variance is non-negative, L_inv ≥ 0 with equality iff Σ(g(x)) = Σ(x) for all g ∈ G almost surely — i.e., iff Σ is gauge-invariant. Gradient-based training minimizes L_inv over the model class. At each training step, the gradient ∇_θ L_inv points toward reduced invariance violation (the loss is a sum of non-negative terms, each differentiable in the model parameters θ). By standard convergence results for gradient descent on smooth losses: if the model class is sufficiently expressive to represent a gauge-invariant function and the training distribution covers the transformation suite G, then L_inv → 0, which implies E[IV(x, G)] → 0. □

The geometric framework transforms the alignment problem from an intractable question about AI goals to a tractable question about mathematical structure.

The cage is not made of rules. It is made of geometry — and geometry does not have loopholes.