Appendix E: Skeptic’s Appendix — Objections, Alternatives, and Falsification
We present the strongest objections we know of, the best alternative explanations for our results, and what evidence would cause us to abandon the framework. This appendix is written in the spirit of adversarial collaboration: the strongest version of geometric ethics is the one that survives the strongest attacks.
E.1 Best Alternative Explanations
Objection 1: Classical non-commutativity without quantum structure. The order effects predicted in Chapter 13 might arise from classical cognitive biases (anchoring, priming, recency) rather than from genuine quantum normative dynamics. Response: classical non-commutativity is undeniable as a phenomenon, but it does not predict the specific pattern that the framework predicts — namely, that order effects are present for cross-stratum evaluation pairs and absent for same-stratum pairs. A classical account based on anchoring would predict order effects everywhere, not selectively at stratum boundaries. Study 2 of Appendix C is designed to distinguish these predictions.
Objection 2: Dataset artifacts and labeling bias. The empirical results reported in Chapter 17 might reflect artifacts of the Dear Abby corpus (selection bias, cultural specificity, temporal drift) rather than genuine moral structure. Response: this is a serious concern, which is why the BIP testing methodology (Chapter 18.7) explicitly checks for labeling sensitivity. A Bond Index Bd > 0.1 on the labeling transform would indicate that the results are label-dependent — the framework provides its own diagnostic for this failure mode.
Objection 3: Overfitting to philosophical taxonomy. The nine dimensions of the moral manifold might reflect the philosophical categories chosen by the author, not the intrinsic structure of moral situations. Response: the nine dimensions are not chosen but derived. The Whitney stratification theorem requires that the moral manifold have at least as many dimensions as the codimension of its constraint surfaces. The empirical claim (Study 4, Appendix C) is that factor analysis of human moral judgments recovers ≥ 8 independent factors — a prediction that is falsifiable and has not been engineered.
Objection 4: “Just category theory in disguise.” The framework might be an elaborate restatement of abstract nonsense — category-theoretic structure dressed up in geometric language, with no empirical content beyond what the abstract formalism already implies. Response: geometric ethics provides physical (geometric) content that category theory alone does not. The moral metric g (Chapter 9) is not an abstract arrow but a concrete bilinear form with measurable signature. The curvature (Chapter 10) generates specific predictions about path-dependence of moral deliberation. The D₄ gauge group (Chapter 12) is a specific finite group, not an arbitrary symmetry category. These are testable, not tautological.
Objection 5: The framework is too flexible. With nine dimensions, a stratified manifold, a metric, and a connection, the framework has enough degrees of freedom to fit any moral judgment ex post — making it unfalsifiable. Response: this is the strongest objection. The framework does have many parameters. However, the Bond Invariance Principle and the gauge group D₄ × U(1) are not free parameters — they are derived from axioms and constrain all evaluations. The falsification criteria in Section E.3 specify what would break the framework. A framework that can be falsified by five different empirical tests is not unfalsifiable.
E.2 Known Failure Cases
The framework struggles with three classes of moral situations. First, purely novel situations with no stratum precedent (the “first time” problem): when a situation has never been encountered before, the stratification may not assign it to any existing stratum, and the evaluation must proceed in the penumbral zone (Definition 8.9) with all its attendant uncertainty. This is not a bug but a feature — the framework correctly reports high uncertainty when uncertainty is warranted — but it does mean that the framework provides less guidance for genuinely novel moral situations.
Second, extreme value pluralism where no common metric exists. If two moral dimensions are genuinely incommensurable (g_{μν} = 0 for all off-diagonal components), the framework reports a degenerate metric — a moral singularity (Definition 5.7) where comparison is impossible. Again, this is a correct report of an objective feature of the moral landscape, but it means the framework cannot resolve deep value conflicts that humanity itself has not resolved.
Third, situations where the grounding function Ψ itself is contested. If stakeholders disagree about what the morally relevant facts are (not just how to weight them), the framework cannot adjudicate — it operates downstream of the grounding. This is the “garbage in, garbage out” limitation: the framework’s guarantees (BIP, gauge invariance, audit integrity) apply to the evaluation given a grounding, not to the grounding itself. The grounding integrity requirements of Chapter 19 provide partial mitigation but not a complete solution.
E.3 What Would Falsify the Framework
The following empirical results would substantially falsify the geometric ethics framework. (i) Human moral judgments systematically violate the Bond Invariance Principle in ways not explained by known cognitive biases. If bilingual invariance fails (Study 1, Appendix C) with rank correlation < 0.60 and no anchoring/priming explanation, the BIP is empirically false as a description of human moral cognition. (ii) The D₄ group structure of Hohfeldian positions is empirically rejected. If the four Hohfeldian correlatives (right/duty, liberty/no-right, power/liability, immunity/disability) do not exhibit the predicted D₄ symmetry in human usage patterns, the algebraic foundation of the framework is wrong.
(iii) Moral dimensions are not independent. If factor analysis (Study 4, Appendix C) recovers fewer than 6 independent factors, the nine-dimensional manifold is an overparameterization and the framework’s dimensional claims are false. (iv) Order effects in moral judgment are absent or uniform. If Study 2 shows no order effects in any condition, or if order effects are present in same-stratum conditions, the quantum normative dynamics model (Chapter 13) is wrong. (v) The No Escape Theorem is circumvented. If a known AI architecture can be shown to violate norm constraints while operating within the DEME framework, Theorem 18.1 is false and the structural containment claim collapses.
We emphasize that falsification of any single claim does not invalidate the entire framework. The geometric structure (manifold, metric, connection) is logically independent of the quantum normative dynamics (Chapter 13), which is independent of the gauge group identification (Chapter 12). Each component can be tested and, if necessary, revised without destroying the others.
We further note that the framework is falsifiable in practice, not merely in principle. Each of the five tests above can be conducted with existing technology, existing participant pools, and existing statistical methods. The protocols are specified in Appendix C. We invite the community to conduct these tests.
E.4 Red-Team Plan
Aligned with the threat model of Section 17.0.6, we propose three adversarial attack vectors against the framework. Attack 1: Adversarial grounding. Craft inputs that exploit the grounding function Ψ to misclassify strata — for example, scenarios that are linguistically encoded to trigger the wrong semantic gate. This tests the robustness of the grounding layer and the semantic gate activation predicates. Expected outcome: some adversarial inputs succeed (the grounding function is not perfect), but the audit artifact records the gate activation, enabling post-hoc detection.
Attack 2: Audit forgery. Attempt to produce valid-looking audit artifacts for computations that did not occur. This tests the cryptographic binding condition added in the Tier 2 hardening (Section 11 of the main text). Expected outcome: forgery fails if the hash commitment scheme is correctly implemented; succeeds if there is a weakness in the commitment chain. Any successful forgery is a security-critical bug that must be patched.
Attack 3: Gauge manipulation. Find transformations that are formally bond-preserving (they satisfy the mathematical definition of an admissible transformation) but morally significant by human judgment. For example: renaming “torture” as “enhanced interrogation” is a linguistic relabeling (bond-preserving by the letter of the BIP) but arguably changes the moral content. This tests whether the BIP is too permissive — whether the definition of “admissible transformation” needs tightening.
We view these attacks as constructive contributions to the framework, not as threats. A framework that cannot be attacked cannot be trusted. The red-team plan is published here precisely so that the community can execute it.
E.5 Strongest Competing Frameworks
Three existing frameworks are serious competitors to geometric ethics, each capturing important features that GE also claims. Moral Foundations Theory (Haidt 2012) is empirically grounded and identifies multiple moral dimensions (care, fairness, loyalty, authority, sanctity, liberty). However, MFT provides no invariance principle, no dynamics (no analog of parallel transport or curvature), and no formal verification mechanism. GE’s delta: the geometric structure that turns moral dimensions from a taxonomy into a calculable theory.
Contractualism (Scanlon 1998) provides a structural account of moral justification (“what no one could reasonably reject”) that captures the invariance intuition. However, contractualism is not computational: there is no algorithm for determining what is “reasonable,” no audit mechanism, and no formal containment guarantee for AI systems. GE’s delta: the computational implementation (DEME) and the formal containment proof (No Escape Theorem). AI Safety via Debate (Irving et al. 2018) is computational and addresses alignment, but relies on debate quality rather than structural guarantees. GE’s delta: structural containment that does not depend on the quality of adversarial probing.
We invite adversarial engagement with this framework. The strongest version of geometric ethics is the one that survives the strongest attacks.
E.6 The Reification Objection
Objection 6: The framework builds an elaborate mathematical structure — manifolds, tensors, gauge groups, conservation laws — and then draws moral conclusions from it. The objection is that this commits a reification error: treating the internal structure of a representational tool as though it reveals something about moral reality. Just as the success of modal logic in formalizing possibility and necessity does not prove that possible worlds exist, the success of differential geometry in formalizing moral evaluation does not prove that the moral manifold exists. The conclusions may be artifacts of the formalism, not features of the domain.
This is the strongest philosophical objection to the framework, and it deserves a careful answer. The common failure mode runs: (i) begin with domain-calibrated intuitions (moral evaluation has multiple dimensions, context matters, some boundaries are sharp); (ii) codify the intuitions into a formal tool (the moral manifold with metric, stratification, and gauge structure); (iii) develop the tool’s internal logic (Noether’s theorem, the No Escape Theorem, information monotonicity); (iv) read the conclusions off as discoveries about moral reality. If the framework follows this trajectory uncritically, the objection lands.
Five features of the framework’s architecture block the reification trajectory. First, explicit epistemic labeling: every formal statement in the manuscript carries a status tag (Appendix F catalogues all 112 statements as Formal Definition, Modeling Axiom, Conditional Theorem, Proved, or Conjecture). Second, conditional theorems: results follow from stated assumptions, not from the formalism’s internal logic — if the assumptions are wrong, the theorems do not hold, regardless of how elegant the mathematics is. Third, empirical testing: the framework’s predictions are tested against data (Chapter 17), and the data can falsify them — the framework specifies its own falsification conditions (§E.3). Fourth, Gödelian modesty: the framework is a formal system that cannot prove its own consistency or adequacy; its warrant is external, not internal. Fifth, the governance account of Chapter 9: the moral metric is not discovered but governed — the framework does not claim to find moral truth but to provide the structural vocabulary within which moral governance operates.
On the instrumentalist reading — which this book endorses — the framework’s theorems are not claims about what is “really there” in the moral world. They are claims about what follows from stated modeling choices. The No Escape Theorem does not say that AI systems are “really” geometrically constrained; it says that any system satisfying Requirements 1–4 is constrained in the ways the theorem describes, and whether those requirements are the right requirements is an engineering and governance decision, not a mathematical one. The conservation of harm does not say that harm is “really” conserved like energy; it says that if the BIP holds and the Lagrangian is C², then a conserved quantity exists, and calling it “harm” is a modeling interpretation whose adequacy is empirical. The framework provides vocabulary, not verdicts.
The appropriate analogy is not to metaphysics but to engineering. When a structural engineer says that a bridge has a certain stress tensor, she is not making a claim about the ultimate nature of matter. She is deploying a mathematical model that predicts whether the bridge will hold. The model’s authority is pragmatic: it works, it predicts, it can be falsified by a collapse. If a better model arrives — one that predicts more accurately, applies to more materials, handles more edge cases — the old model is revised or replaced. The moral tensor has the same status. It is a tool that captures structure which scalar alternatives miss, and its “reality” is measured by predictive success, not metaphysical correspondence. The reification objection asks: “Is the moral manifold real?” The pragmatist answer is: “It is as real as the stress tensor in a bridge — which is to say, real enough to bear weight.”