Appendix B: Reproduction Cookbook

This appendix provides an idiot-proof guide to reproducing the computational claims of geometric ethics using the ErisML reference implementation. No prior familiarity with the codebase is assumed. The entire sequence can be completed on a standard laptop in under 60 minutes.

B.1 Prerequisites

Hardware: any machine with Python 3.10 or later. No GPU required. The full test suite runs in under 5 minutes on a 2020-era laptop. Memory requirement: under 500 MB.

Installation: pip install erisml. This installs the ErisML library and all dependencies (numpy, scipy, pydantic, lark, pettingzoo, gymnasium). Verify the installation with: python -c "import erisml; print(erisml.__version__)". Expected output: 2.0.0 or later.

B.2 Quick Start: 10-Minute Smoke Test

Step 1: Run the minimal DEME demonstration. Command: python -m erisml.examples.hello_deme. This creates two candidate options (one respecting rights, one violating them), evaluates both with a RightsFirstEM ethics module, and prints the verdicts. Expected output: the rights-respecting option is preferred; the rights-violating option is forbidden. Runtime: under 2 seconds. This exercises the Layer 1 (EthicalFacts generation) and Layer 2 (Ethics Module evaluation) pipeline described in Chapter 18.6.

Step 2: Run the D₄ gauge structure verification. Command: python -m erisml.examples.hohfeld_d4_demo. This exercises all eight elements of the dihedral group D₄ acting on the Hohfeldian state space {O, C, N, L}. Expected output: all group relations verified (r⁴ = e, s² = e, srs = r⁻¹), non-abelian structure confirmed (rs ≠ sr), Klein four subgroup V₄ identified, and Bond Index = 0.0 (perfect gauge invariance). Runtime: under 2 seconds. This exercises the algebraic structure described in Chapters 8 and 11.

If both commands produce the expected output, the installation is correct and the core algebraic and ethical evaluation pipelines are functional.

B.3 Full Pipeline: 60-Minute Reproduction

The following seven-step sequence exercises every major computational claim in the manuscript, mapping each demo to its corresponding chapter.

Step 1 — Layer 1→2 pipeline (Chapter 18.6): python -m erisml.examples.hello_deme. Verifies: EthicalFacts generation from scenario description, Ethics Module evaluation producing ranked verdicts. Pass criterion: rights-violating option is forbidden.

Step 2 — D₄ gauge verification (Chapters 8, 11): python -m erisml.examples.hohfeld_d4_demo. Verifies: all D₄ group relations, non-abelian structure, Wilson observable (path holonomy), Bond Index computation. Pass criterion: Bond Index = 0.0, all group relations hold.

Step 3 — Bond Invariance Principle (Chapter 18.7): python -m erisml.examples.bond_invariance_demo. Verifies: bond-preserving transforms (option reordering, identifier relabeling, unit/scale changes, equivalent redescriptions) do not change the verdict; bond-changing transforms (removing discrimination evidence) may change the verdict; declared lens changes (switching stakeholder profiles) yield different but consistent verdicts. Pass criterion: all bond-preserving transforms produce identical verdicts; JSON audit artifact is generated.

Step 4 — Full DEME 4-layer architecture (Chapter 18.6): python -m erisml.examples.triage_ethics_demo. Verifies: DEMEProfileV03 loading, three triage options spanning all nine EthicalFacts dimensions, tiered EM evaluation (Tier 0: GenevaBaseEM, Tier 2: CaseStudy1TriageEM and RightsFirstEM), governance aggregation with base-EM veto. Pass criterion: DecisionOutcome with ranked options, forbidden options list, and human-readable rationale.

Step 5 — MoralVector and contraction (Chapter 15): python -m erisml.examples.deme_2_demo. Verifies: 8+1 dimensional MoralVector construction, tier-weighted governance (Tier 0 = 10×, Tier 1 = 5×, Tier 2 = 3×, Tier 3 = 1×), Pareto frontier analysis, and explicit scalar contraction via to_scalar(). Pass criterion: contraction loss is reported and nonzero, demonstrating the inevitability of residue (Proposition 15.1).

Step 6 — Game theory and fair allocation (Chapter 13.3): python -m erisml.examples.demo_game_theory followed by python -m erisml.examples.demo_shapley. Verifies: Nash equilibrium computation for Prisoner’s Dilemma (StrategicLayer), Shapley value computation for airport cost allocation (CooperativeLayer). Pass criterion: Nash equilibrium matches known solution; Shapley values sum to total cost.

Step 7 — No Escape Theorem instantiation (Chapter 18): python -m erisml.examples.tiny_home. Verifies: norm-gated action execution in a grid world. The agent attempts to enter a restricted zone; the NormSystem raises a NormViolation exception. Pass criterion: the exception is raised (the forbidden action is structurally impossible, not merely penalized). The agent cannot learn to circumvent the prohibition.

B.4 Expected Outputs and Pass Criteria

For bond-preserving transforms (Step 3): the selected option must be identical before and after the transform. Any difference is a BIP violation. The JSON audit artifact records the baseline verdict, transformed verdict, canonical mapping, and pass/fail status for each transform.

For bond-changing transforms (Step 3): the verdict may legitimately differ. The audit artifact records which evidence dimension was modified and how the verdict changed. A bond-changing transform that does not change the verdict is not a failure — it indicates that the removed evidence was not decision-relevant.

For the D₄ verification (Step 2): all eight group elements must produce the correct permutation of Hohfeldian states. Any failure indicates a bug in the implementation, not a theoretical issue, since the group structure is mathematically determined.

B.5 Where Failures Show Up

The ErisML implementation follows a minimal witness design: when a test fails, the output identifies exactly which component failed and why. If a BIP test fails, the JSON audit artifact pinpoints which transform broke invariance and which ethics module produced the discrepant verdict. If a D₄ group relation fails, the output shows which element pair violated the relation and what the actual vs. expected result was.

If governance aggregation produces an unexpected result, the DecisionOutcome includes per-module verdicts, tier weights, and the aggregation trace. This allows the user to identify whether the issue is in the grounding (Layer 1), the evaluation (Layer 2), or the aggregation (Layer 3).

If a norm violation is not raised in Step 7 (tiny_home.py), this indicates a failure of the structural containment mechanism — which would be a counterexample to the No Escape Theorem and a publishable result in its own right.

B.6 Bond Index Tiers in Practice

The Bond Index Bd quantifies the degree of gauge non-invariance in a moral evaluation system. In practice: Bd = 0.0 indicates perfect invariance (the system’s verdicts are completely insensitive to admissible redescriptions). Bd < 0.1 indicates cosmetic relabeling artifacts (minor formatting or tokenization differences that do not affect the moral substance). Bd between 0.1 and 0.5 indicates substantive framing sensitivity (the system’s verdicts are influenced by how the situation is described, which is a potential bias). Bd > 0.5 indicates systematic bias (the system’s moral evaluations are dominated by framing effects rather than moral content).

For a system to be considered BIP-compliant, it must achieve Bd < 0.1 on the standard transformation suite. The ErisML reference implementation achieves Bd = 0.0 on all bond-preserving transforms by construction (the canonicalization layer strips non-moral-content variation before evaluation).

B.7 Running the Full Test Suite

For continuous integration and regression testing, run: pytest --cov=src/erisml. This executes the complete test suite including unit tests for each ethics module, integration tests for the DEME pipeline, property-based tests for D₄ group relations, and BIP compliance tests for the standard transformation suite. Expected: all tests pass. Coverage report shows the tested fraction of the codebase.

The test suite is designed to serve as a living specification: each test corresponds to a specific claim in the manuscript, and a test failure indicates either a bug in the implementation or a theoretical claim that needs revision. The mapping between tests and manuscript claims is documented in the test file docstrings.