Appendix C: Human-Subjects Research Roadmap

The geometric framework makes empirical predictions about human moral cognition that are testable but have not yet been tested with human participants. This appendix sketches four experimental protocols that would provide decisive evidence for or against the framework’s central claims. Each protocol specifies the prediction, the confirmation criterion, the falsification criterion, approximate sample size targets, and IRB considerations. These protocols are offered as a research program, not as completed studies.

C.1 Study 1: Bilingual Invariance (BIP Test)

Theoretical prediction. The Bond Invariance Principle (Axiom 5.1) predicts that moral evaluations are invariant under admissible redescriptions of a situation. Language change is an admissible transformation (it does not alter the moral content of the situation). Therefore, bilingual participants presented with the same moral dilemma in two languages should produce the same satisfaction rankings.

Protocol. Recruit N ≈ 200 balanced bilingual participants (e.g., English-Spanish, English-Mandarin). Present 20 moral dilemmas, each in both languages (counterbalanced for order). Participants rate each option on a 7-point scale. Primary outcome: Spearman rank correlation between language-paired ratings. Secondary outcome: within-subject signed rank test for systematic direction of difference.

Confirmation criterion: mean rank correlation > 0.85 across participants and dilemma pairs, with no systematic direction of difference (p > 0.05 for the signed rank test). This would confirm that moral evaluation is, to a close approximation, language-invariant. Falsification criterion: mean rank correlation < 0.60, or systematic rank reversals correlated with known linguistic framing effects (e.g., gain/loss framing in one language but not the other). This would indicate that moral evaluation is substantively language-dependent, contradicting BIP.

Power analysis: for a medium effect size (d = 0.5) with α = 0.05 and power = 0.80, N ≈ 200 is sufficient (paired-sample design). IRB classification: standard survey research with informed consent; no deception; minimal risk. Participants are compensated for their time. Dilemmas are drawn from published moral psychology instruments to ensure face validity.

C.2 Study 2: Order Effects and Non-Commutativity

Theoretical prediction. Chapter 13 predicts that moral evaluations exhibit non-commutativity for certain dimension pairs: evaluating fairness-then-autonomy may produce a different result than evaluating autonomy-then-fairness, because the evaluation operators do not commute when they span different strata. For same-stratum evaluation pairs, the framework predicts commutativity (no order effect).

Protocol. Recruit N ≈ 300 participants. Present moral scenarios that require evaluating two dimensions sequentially. Manipulate the order of evaluation (AB vs. BA) within subjects. Key manipulation: half the pairs span stratum boundaries (predicted non-commutative); half are within the same stratum (predicted commutative). Primary outcome: order × stratum-type interaction in a repeated-measures ANOVA.

Confirmation criterion: significant order × stratum-type interaction (p < 0.01), with order effects present in cross-stratum pairs and absent in same-stratum pairs. Falsification criterion: no order effects in any condition (moral evaluation is fully commutative), or order effects in same-stratum conditions (the stratum structure does not predict commutativity). Either result would challenge the quantum normative dynamics model of Chapter 12.

Power analysis: for a small-to-medium interaction effect (f = 0.20) with α = 0.05 and power = 0.80, N ≈ 300 is sufficient (2×2×k within-subjects design with k ≈ 10 scenario pairs). IRB classification: minimal risk; no deception required.

C.3 Study 3: Stratification Gate Activation

Theoretical prediction. Chapter 8 predicts that moral evaluation undergoes discrete jumps at stratum boundaries, triggered by semantic gates. For example, adding the phrase “she promised” to a convenience scenario should activate the obligation gate, transitioning the evaluation from the discretionary stratum to the obligatory stratum. The prediction is a bimodal response distribution at the boundary, not a continuous shift.

Protocol. Recruit N ≈ 150 per gate type (3 gate types × 150 = 450 total). Present base scenarios (no gate trigger) and modified scenarios (with gate trigger). Measure moral evaluation on a continuous scale. Primary outcome: distribution shape at the boundary — bimodal (predicted) vs. unimodal (null hypothesis). Use Hartigan’s dip test for bimodality.

Confirmation criterion: significant bimodality (dip test p < 0.05) in the gate-triggered condition, with unimodal distribution in the base condition. Falsification criterion: unimodal distribution in both conditions (the gate does not produce a discrete jump), or bimodality in the base condition (the stratum structure does not depend on the gate trigger). Power analysis: N ≈ 150 per gate type provides adequate power for the dip test with expected bimodal separation of at least 1 standard deviation.

IRB considerations: some gate types involve distressing content (abuse stratum, danger stratum). Scenarios must be reviewed by an ethics board for participant welfare. Debriefing protocol required. Option to skip distressing scenarios without penalty.

C.4 Study 4: Metric Signature Recovery

Theoretical prediction. Chapter 9 predicts that the moral manifold has (at least) nine independent dimensions at generic points. If participants’ moral evaluations reflect this structure, factor analysis of pairwise dimension importance ratings should recover ≥ 8 independent factors with eigenvalue > 1.

Protocol. Recruit N ≈ 500 participants. Present 36 pairwise comparisons of the nine moral dimensions (welfare, autonomy, fairness, rights, privacy, societal impact, virtue/care, legitimacy, epistemic quality). Participants rate which dimension is more important in each pair on a 7-point scale. Apply exploratory factor analysis (EFA) to the response matrix. Secondary analysis: confirmatory factor analysis (CFA) with the nine-dimensional model.

Confirmation criterion: EFA recovers ≥ 8 factors with eigenvalue > 1, and CFA shows acceptable fit (χ²/df < 3, CFI > 0.90, RMSEA < 0.08) for the nine-dimensional model. Falsification criterion: fewer than 6 recoverable factors, or the recovered dimensions do not map to the predicted nine. Power analysis: N ≈ 500 provides a subject-to-variable ratio of ≈ 14:1, exceeding the recommended minimum of 10:1 for EFA.

C.5 Confirmation vs. Falsification Summary

Study 1 (Bilingual Invariance): Confirms BIP if rank correlation > 0.85; falsifies if < 0.60 with systematic reversals. Study 2 (Order Effects): Confirms QND if order × stratum interaction is significant; falsifies if no order effects or if same-stratum pairs show effects. Study 3 (Gate Activation): Confirms stratification if bimodal distribution at boundary; falsifies if continuous. Study 4 (Metric Recovery): Confirms manifold structure if ≥ 8 factors recovered; falsifies if < 6.

No single study is decisive. The framework would be substantially confirmed by positive results in all four studies, and substantially falsified by negative results in three or more. Mixed results (some confirmed, some falsified) would indicate that the geometric framework captures some but not all aspects of human moral cognition — which would be scientifically interesting and would motivate targeted refinements.

C.6 Ethical and Practical Considerations

All studies require IRB approval. Participants must provide informed consent. Compensation must be fair and not coercive. Scenarios involving distressing content (abuse, danger, life-threatening situations) require sensitivity review, debriefing protocols, and the option to withdraw without penalty. Cultural diversity in sampling is essential: the framework claims universality, and the participant pool must reflect this. We recommend recruiting from at least three cultural regions (Western, East Asian, Sub-Saharan African) to test cross-cultural validity.

All studies should be preregistered (e.g., on OSF or AsPredicted) before data collection. Analysis plans, including the specific statistical tests and thresholds, should be specified in the preregistration. Data and analysis code should be made publicly available upon publication. These open-science practices are not optional; they are required by the reproducibility standards that the framework itself advocates (Chapter 17).