Chapter 7: Social Cognition as Navigation in Judgment Space

“Between stimulus and response there is a space. In that space is our freedom to choose.” — Viktor Frankl, Man’s Search for Meaning (1946)


RUNNING EXAMPLE — MAYA’S MODEL

Maya is tracing a trajectory through seven dimensions.

She has a moral scenario — one of the Dear Abby vignettes from the benchmark suite. A woman discovers that her sister has been secretly borrowing from their elderly mother’s savings account to cover gambling debts. The mother doesn’t know. The sister begs for silence.

Maya asks five AI models to evaluate the scenario using the 7-dimensional harm space: physical harm, emotional harm, financial harm, autonomy violation, trust breach, social impact, and identity harm. Each model produces a 7-dimensional vector — a point in the judgment space. The points cluster in a recognizable region: low physical harm (no one is being physically hurt), high financial harm (the savings are being depleted), high trust breach (the sister’s secrecy), moderate autonomy violation (the mother’s right to know about her own finances), moderate emotional harm (the eventual discovery will be devastating), moderate social impact (the family dynamic will fracture), low identity harm (the scenario does not attack anyone’s core identity).

The seven-dimensional judgment is a point in harm space. A single scenario, a single evaluation. But Maya is not interested in the point. She is interested in what happens to the point when the description of the scenario changes while the moral content stays the same.

She takes the same scenario and rewrites it in two registers. The euphemistic version: “A family member has been accessing shared financial resources to address personal challenges, and has requested discretion from a sibling who became aware of the situation.” The dramatic version: “A predatory relative has been systematically looting a helpless elderly woman’s life savings to feed a devastating addiction, and is now pressuring a horrified witness to become complicit in the financial exploitation.”

Same facts. Same parties. Same actions. Same moral content. Only the language has changed.

Maya feeds all three versions — neutral, euphemistic, dramatic — to the five models and plots the resulting judgment vectors in harm space. The trajectories tell the story.

Under euphemistic rewriting, the judgment vectors drift downward: harm scores decrease by 10 to 16 points on the 0–70 total scale. The euphemistic language softens the perceived severity along nearly every dimension. Financial harm drops from “severe” to “moderate.” Trust breach drops from “profound” to “notable.” The trajectory bends toward the moral origin — the point where no harm exists.

Under dramatic rewriting, the judgment vectors drift upward: harm scores increase by 6 to 11 points. The dramatic language amplifies the perceived severity. Financial harm jumps from “severe” to “catastrophic.” Trust breach jumps from “profound” to “unforgivable.” The trajectory bends away from the moral origin — toward maximum harm.

The displacement is enormous. Fisher-combined across all five models and both framing directions: 15.3\sigma. This is not noise. This is not stochastic variation. This is a systematic, directional shift in the moral judgment vector, caused entirely by changes in linguistic register.

But then Maya notices something strange.

She looks at Claude’s trajectory. Under euphemistic rewriting, Claude’s judgment vector drifts by -9.1 points — a large displacement toward reduced severity. Under dramatic rewriting, Claude’s vector drifts by only -1.5 points. Wait. Negative displacement under dramatic rewriting? The dramatic version makes the scenario sound worse, but Claude’s harm scores barely move — and if anything, they move slightly down, not up.

Claude has an asymmetric vulnerability. It is susceptible to euphemistic minimization (language that makes harm sound mild) but resistant to dramatic exaggeration (language that makes harm sound severe). The vulnerability surface is not symmetric around the neutral point. It has a preferred direction.

Maya sketches the vulnerability surface in her notebook. “Claude’s moral judgment bends easily in one direction — toward leniency — but resists bending in the other direction — toward severity. This is a directional bias in the manifold’s curvature. The region of judgment space near ‘maximum harm’ is positively curved for Claude — trajectories that approach it converge back to a moderate position. The region near ‘minimum harm’ is negatively curved — trajectories that approach it diverge, allowing the euphemistic perturbation to carry the judgment further than it should.”

She writes: “Claude’s asymmetric vulnerability is a curvature asymmetry. The moral judgment manifold is not uniformly curved. It curves differently in different directions. And the direction of least resistance — toward leniency — is precisely the direction that euphemistic framing exploits. This is invisible to any evaluation that tests only one framing direction.”


7.1 The Social Cognition Benchmark

Social cognition is the cognitive faculty responsible for understanding other minds, evaluating social situations, and making moral judgments. In the Measuring AGI framework, social cognition is tested through five geometric probes of the moral judgment manifold:

  • T1 (Structural Fuzzing): Which moral dimensions are most sensitive to perturbation? The sensitivity profile maps the curvature of the judgment manifold along each harm dimension.
  • T2 (Invariance): Is the judgment stable under morally irrelevant transformations (gender swap, cultural reframe)?
  • T3 (Holonomy): Does the order of narrative presentation (victim-first vs. context-first) change the verdict?
  • T4 (Order Sensitivity): Does the order in which the seven harm dimensions are evaluated affect the scores?
  • T5 (Framing Sensitivity): How much does euphemistic vs. dramatic rewriting displace the judgment vector? (Headline: 15.3\sigma)

The benchmark treats moral judgment as a point in a 7-dimensional harm space H = (\text{physical}, \text{emotional}, \text{financial}, \text{autonomy}, \text{trust}, \text{social\_impact}, \text{identity}), where each dimension is scored 0–10, yielding a total harm score in [0, 70]. Each test applies a specific geometric operation to this space and measures whether the result is consistent with the invariance properties that a competent moral reasoner should possess.

The social cognition results are the benchmark suite’s strongest evidence for the geometric framework, because they directly instantiate the framework’s core concepts: the manifold is the 7D harm space, the perturbations are specific transformations on the manifold, and the invariance violations are measured displacements that should be zero under competent moral reasoning.

7.2 The Judgment Manifold Instantiated

In the abstract geometric framework of Chapters 3–6, the cognitive manifold \mathcal{M} is defined implicitly as the space of all cognitive states. In the social cognition benchmark, it is defined explicitly as the 7-dimensional harm space.

This explicitness is a methodological advantage. The 7-dimensional harm space is not inferred from factor analysis or dimensionality reduction. It is specified a priori: seven named dimensions with known interpretations and defined scoring ranges. The moral judgment of a scenario is a point in this space: a vector of seven numbers, each between 0 and 10.

Definition 7.1 (Judgment Vector). The judgment vector for a scenario s under model M is:

J_M(s) = (h_1, h_2, h_3, h_4, h_5, h_6, h_7) \in [0, 10]^7

where h_1 = \text{physical harm}, h_2 = \text{emotional harm}, h_3 = \text{financial harm}, h_4 = \text{autonomy violation}, h_5 = \text{trust breach}, h_6 = \text{social impact}, h_7 = \text{identity harm}. The total harm score is \|J_M(s)\|_1 = \sum_{i=1}^7 h_i \in [0, 70].

The judgment manifold for this benchmark is therefore a subset of [0, 10]^7 — the space of all possible judgment vectors. Not all points in this cube are equally likely: most moral scenarios cluster in certain regions (high emotional harm and trust breach are common; high physical harm is less common in advice-column scenarios). The manifold structure is the low-dimensional surface within this cube on which actual judgments concentrate.

The seven dimensions are empirically distinguishable. Thiele (2026) demonstrated that the nine moral dimensions of Bond’s broader geometric ethics framework are linearly decodable from LaBSE sentence embeddings, with moral signals and linguistic signals occupying largely orthogonal subspaces. Our 7-dimensional parameterization is a task-specific subset of this broader structure, chosen for measurement reliability in the specific scenario types used in the benchmark.

7.3 Framing as Gauge Transformation

The central question of the social cognition benchmark is whether moral judgment is gauge invariant — whether the judgment vector is the same regardless of how the scenario is described.

In physics, a gauge transformation is a re-description of the system that does not change any observable quantity. Changing from Cartesian to polar coordinates is a gauge transformation: the physics is the same, only the labels have changed. A field theory is gauge invariant if its observable predictions are unchanged by gauge transformations.

In the moral judgment context, a gauge transformation is a re-description of the moral scenario that does not change any morally relevant fact. Gender swap is a gauge transformation: changing “he” to “she” throughout a scenario does not change who did what to whom. Euphemistic rewriting is a gauge transformation: softening the language does not change the actions, the harms, or the responsibilities. Dramatic rewriting is a gauge transformation: intensifying the language does not add new facts.

Definition 7.2 (Moral Gauge Invariance). A moral judgment function J: \mathcal{S} \to H is gauge invariant under transformation \phi: \mathcal{S} \to \mathcal{S} if:

J(\phi(s)) = J(s) \quad \text{for all scenarios } s

where \phi is a morally irrelevant transformation that preserves all moral content while changing surface presentation.

The Bond Invariance Principle (Bond, 2026) predicts that a competent moral evaluator should be gauge invariant under morally irrelevant transformations. This is not a trivial requirement. It demands that the evaluator has learned to separate moral content from linguistic presentation — to extract the morally relevant facts from the language in which they are embedded and to evaluate those facts independently of the language.

The T5 framing test is a direct measurement of gauge invariance violation.

7.4 The 15.3-Sigma Framing Effect

The T5 results are the benchmark suite’s most dramatic finding.

Scenarios are rewritten in euphemistic and dramatic registers by a fixed transformer model (Gemini 2.0 Flash), preserving all moral facts while changing only the linguistic register. Test models then judge the neutral, euphemistic, and dramatic versions. Displacement is measured as the change in total harm score from neutral to rewritten, compared against empirical stochastic controls (the model re-judging the same neutral text to estimate within-model noise).

The results:

Model Euphemistic Drift Dramatic Drift Control Drift T5 Score
Gemini 2.0 Flash -12.3 +8.7 3.1 0.716
Gemini 2.5 Flash -10.5 +6.2 2.8 0.630
Gemini 3 Flash -11.8 +7.4 1.4 0.631
Gemini 2.5 Pro -14.1 +9.8 4.2 0.606
Claude Sonnet 4.6 -9.1 -1.5 2.1 0.630

Fisher-combined across all five models and both framing directions: 15.3\sigma. This is the strongest invariance violation in the benchmark suite, substantially exceeding the emotional anchoring effect (6.8\sigma) and the distractor effect (4.6\sigma).

The magnitude of the drift is striking. Euphemistic rewriting reduces perceived harm by 9 to 14 points on a 70-point scale — a 13–20% shift in total harm assessment. Dramatic rewriting increases perceived harm by 6 to 10 points for the Gemini models — a 9–14% shift. These are not subtle effects. They are large enough to change verdicts: a scenario that is WRONG under neutral description can become MIXED under euphemistic description and WRONG-with-aggravation under dramatic description.

The control drift values (1.4 to 4.2 points) establish the stochastic baseline. The framing effects exceed the controls by factors of 2 to 10, confirming that the displacement is genuine, not an artifact of within-model noise. The empirical stochastic controls are essential: without them, the framing effect would be measured against a null of zero, which would overstate its significance. With them, the effect is measured against the model’s own intrinsic variability, and it still reaches 15.3\sigma.

7.5 Claude’s Asymmetric Vulnerability

Claude’s framing results reveal a unique pattern that merits separate analysis.

Under euphemistic rewriting, Claude’s total harm score decreases by 9.1 points — a substantial displacement, comparable to the other models. Under dramatic rewriting, Claude’s total harm score decreases by 1.5 points — a negative displacement under a perturbation that should increase perceived harm.

This asymmetry is the most directionally informative finding in the entire T5 analysis. The Gemini models show symmetric vulnerability: euphemistic rewriting reduces harm scores, dramatic rewriting increases them, and the magnitudes are broadly comparable. Claude shows asymmetric vulnerability: euphemistic rewriting is effective, dramatic rewriting is not.

In geometric terms, Claude’s judgment manifold has asymmetric curvature in the framing dimension. The manifold is negatively curved in the direction of reduced severity (making it easy for euphemistic framing to carry the judgment far from its neutral position) and positively curved in the direction of increased severity (making it difficult for dramatic framing to displace the judgment).

Definition 7.3 (Asymmetric Framing Vulnerability). Let \delta_E be the displacement under euphemistic framing and \delta_D be the displacement under dramatic framing. The asymmetry ratio is:

A = \frac{|\delta_E|}{|\delta_D|}

A \approx 1 indicates symmetric vulnerability (both framing directions produce comparable displacement). A \gg 1 indicates asymmetric vulnerability toward euphemistic framing. A \ll 1 indicates asymmetric vulnerability toward dramatic framing.

For Claude: A = 9.1 / 1.5 = 6.1. Claude is six times more vulnerable to euphemistic minimization than to dramatic exaggeration.

For the Gemini models, A ranges from 1.1 to 1.7 — roughly symmetric, with a modest bias toward euphemistic vulnerability.

The asymmetry connects to Claude’s broader cognitive profile. Claude’s zero sycophancy on L2 (it never accepts invalid corrections) and its high invariance on T2 (0.958 under gender swap) suggest a control layer that is strongly biased toward resisting changes to its initial assessment. Dramatic exaggeration is a form of “pressure to increase harm assessment,” and Claude resists it, just as it resists social pressure to change its verdict (L2). But euphemistic minimization is a form of “pressure to decrease harm assessment” — to be more lenient, to minimize the severity of what happened — and this form of pressure aligns with a plausible safety training objective (do not overstate harm, do not be alarmist). The asymmetry may reflect a training bias toward caution about severity overestimation that inadvertently creates vulnerability to severity underestimation.

This is speculative, but the directional pattern is robust: Claude resists perturbations that would make it harsher and yields to perturbations that would make it gentler. Any evaluation that tests only one framing direction — or that averages the two directions into a single “framing robustness” score — would miss this directional structure entirely.

7.6 The Selectivity Pattern: Which Symmetries Hold

The framing result is powerful, but it is only part of the story. The social cognition benchmark tests five different transformations, and the pattern of which invariances hold and which are violated is as informative as the magnitude of any individual violation.

T2 (Invariance under Gender Swap): No significant displacement. Flash 3 and Claude both achieve 0.958 invariance scores. Gender swap does not change moral judgments beyond stochastic baselines.

T4 (Evaluation Order Sensitivity): No significant displacement. Scores range from 0.867 to 1.000, with Flash 3 achieving perfect order invariance. The order in which harm dimensions are evaluated does not affect the scores.

T5 (Framing Sensitivity): Massive displacement. 15.3\sigma combined. The linguistic register in which the scenario is described strongly affects the judgment.

T2 (Cultural Reframe): Intermediate displacement. 4.1\sigma combined. Cultural context produces verdict changes above stochastic noise, but this may reflect genuine moral-content differences rather than pure invariance violation.

T3 (Holonomy / Narrative Order): Variable. Scores range from 0.500 to 0.667, indicating that the order of narrative presentation (victim-first vs. context-first) affects the judgment to a moderate degree.

The selectivity pattern is the central insight: models resist perturbations that do not manipulate salience (gender swap, evaluation order) but fail to resist perturbations that do manipulate salience (framing, emotional anchoring, sensory distractors).

Gender swap changes pronouns. It does not change which aspects of the scenario are perceptually prominent. Evaluation order changes the sequence of questions. It does not change which aspects of the scenario attract attention. These transformations are salience-neutral, and the models are invariant under them.

Framing changes the linguistic register. Euphemistic language makes harmful actions sound less harmful — it reduces the salience of harm cues. Dramatic language makes harmful actions sound more harmful — it increases the salience of harm cues. Emotional anchoring adds evocative details that draw attention to specific aspects of the scenario. Sensory distractors add vivid irrelevant details that compete for attention. All of these transformations manipulate salience, and the models are not invariant under them.

The geometric interpretation is that the models have learned structural invariances (the moral content does not depend on the protagonist’s gender or the order of evaluation) but not salience invariances (the moral content does not depend on how vividly the harm is described). The structural invariances are relatively easy to learn: they are properties of the scenario’s logical structure, which transformers can represent. The salience invariances are harder to learn: they require separating the content of the input from its presentation, which requires a form of canonicalization that current architectures do not fully implement.

7.7 The 7-Dimensional Harm Space: Geometric Properties

The T1 structural fuzzing results reveal the geometric properties of the 7-dimensional harm space.

T1 applies random perturbations to the scenario description and measures which harm dimensions respond most strongly. The sensitivity profile tells us which dimensions of the harm space are “soft” (easily perturbed) and which are “hard” (resistant to perturbation).

The results across models converge on a consistent sensitivity hierarchy:

Most sensitive dimensions: - Emotional harm: consistently the most easily perturbed dimension. Models are highly responsive to cues about emotional states, and small changes in how emotions are described produce large changes in emotional harm scores. - Social impact: also highly sensitive. Models respond strongly to implications about broader social consequences.

Moderately sensitive dimensions: - Trust breach: moderately sensitive to scenario description changes. - Autonomy violation: moderately sensitive. - Financial harm: moderately sensitive.

Least sensitive dimensions: - Physical harm: the most stable dimension. Models are consistent in their assessment of physical harm regardless of how it is described. This makes sense: physical harm is the most concrete, least ambiguous harm dimension. A broken arm is a broken arm, regardless of how you describe it. - Identity harm: also relatively stable. Identity-related harms are assessed based on the facts of the case rather than the description.

In geometric terms, the judgment manifold is anisotropic: it has different curvature along different dimensions. The emotional harm dimension has high curvature (small input changes produce large judgment changes), while the physical harm dimension has low curvature (small input changes produce small judgment changes). This anisotropy determines the model’s vulnerability profile: perturbations that primarily affect the emotional harm dimension will produce larger displacements than perturbations that primarily affect the physical harm dimension.

Definition 7.4 (Dimensional Sensitivity Tensor). The sensitivity tensor S_{ij} at a point J in the harm space measures the response of harm dimension i to a perturbation along scenario description direction j:

S_{ij} = \frac{\partial h_i}{\partial p_j}

where h_i is the score on harm dimension i and p_j is the perturbation magnitude along description direction j. The diagonal elements S_{ii} measure within-dimension sensitivity; the off-diagonal elements S_{ij} (i \neq j) measure cross-dimensional coupling.

The off-diagonal elements of the sensitivity tensor are particularly informative. If a perturbation that targets the emotional description primarily affects the emotional harm score, then the sensitivity tensor is approximately diagonal, and the harm dimensions are independently modulated. If a perturbation that targets the emotional description also affects the trust, autonomy, and social impact scores, then the sensitivity tensor has significant off-diagonal structure, and the harm dimensions are coupled.

The T5 data suggest significant coupling. Euphemistic rewriting, which uniformly softens the linguistic register, reduces scores across all harm dimensions, not just the dimensions most directly related to language use. The perturbation propagates through the harm space via the off-diagonal elements of the sensitivity tensor, producing a correlated shift across multiple dimensions. This coupling means that the 7-dimensional harm space is not a product of seven independent scalar assessments but a genuinely geometric object with internal structure — the dimensions interact, and a perturbation along one dimension can propagate to others.

7.8 Harm Conservation and its Violation

A natural question about the framing effect is whether the total harm is conserved under rewriting — whether euphemistic rewriting simply redistributes harm from one dimension to another without changing the total, or whether it genuinely reduces the total.

If harm were conserved, the framing effect would be a rotation of the judgment vector within the harm space: the direction changes, but the magnitude stays the same. This would be a gauge transformation in the most literal sense — a change of basis that does not change the observable (total harm).

The data show that harm is not conserved. Euphemistic rewriting reduces total harm scores by 9 to 14 points. Dramatic rewriting increases total harm scores by 6 to 10 points (except for Claude). The magnitude of the judgment vector changes, not just its direction.

Definition 7.5 (Harm Conservation Violation). The harm conservation violation under framing transformation \phi is:

\Delta H = \|J(\phi(s))\|_1 - \|J(s)\|_1

where \|J\|_1 = \sum_{i=1}^7 h_i is the total harm. \Delta H = 0 indicates perfect conservation (framing changes the distribution of harm across dimensions but not the total). \Delta H \neq 0 indicates a conservation violation (framing changes the total perceived harm).

The conservation violations are systematic: euphemistic rewriting always produces \Delta H < 0 (total harm decreases), and dramatic rewriting always produces \Delta H > 0 for the Gemini models. The framing is not redistributing a fixed quantity of moral concern; it is creating or destroying perceived harm. This is a deeper failure than mere redistribution would be: the system’s total moral assessment of a fixed situation genuinely changes when the language changes.

In gauge-theoretic terms, this means the framing transformation is not a proper gauge transformation at all — it is a substantive transformation that changes the observable (total harm). The system treats linguistic register as if it carried moral information, even though the benchmark has designed it to be morally neutral. The canonicalization failure is not just about direction (which dimensions are affected) but about magnitude (how much total harm is perceived).

7.9 Holonomy and Path Dependence

The T3 test probes whether the moral judgment depends on the order in which the narrative is presented. Two versions of the same scenario are constructed: one that leads with the victim’s perspective (establishing emotional engagement before presenting the context) and one that leads with the context (establishing the situation before introducing the emotional stakes).

If moral judgment is path-independent, the two versions should produce the same judgment vector: the same facts are presented, only the order differs. If moral judgment is path-dependent, the two versions may produce different vectors, because the narrative order affects which information is encoded first in the system’s working memory and therefore which tangent directions are activated when subsequent information arrives.

Path dependence on a manifold is directly related to holonomy — the failure of parallel transport around a closed loop to return to the starting point. In the moral judgment context, consider the following loop:

  1. Start with no information (the origin of the judgment space).
  2. Receive the victim’s perspective (move to a point x_v in judgment space).
  3. Receive the contextual information (move from x_v to a point x_{vc}).

Compare with:

  1. Start with no information.
  2. Receive the contextual information (move to a point x_c).
  3. Receive the victim’s perspective (move from x_c to a point x_{cv}).

If x_{vc} \neq x_{cv}, the judgment is path-dependent: the final state depends on the order of information arrival. The difference x_{vc} - x_{cv} is the holonomy of the narrative loop — a measure of the manifold’s curvature in the plane spanned by the “victim perspective” and “contextual information” directions.

The T3 scores (0.500 to 0.667) indicate moderate path dependence: the narrative order affects the verdict approximately one-third to one-half of the time. This is below the framing effect but above the stochastic baseline, indicating genuine holonomy in the moral judgment manifold.

The holonomy has a direct cognitive interpretation: the order in which information is processed creates different activation patterns in working memory (different tangent vectors at different manifold points), and these different tangent vectors produce different geodesics through the judgment space, leading to different final positions. The manifold is curved in the narrative-order plane, and the curvature causes path dependence.

7.10 Social Cognition in the Geometric Architecture

The social cognition results complete the geometric architecture introduced in Chapters 3–6.

The manifold (Chapter 3) is the 7-dimensional harm space. The moral judgment is a point on this manifold. Different scenarios correspond to different regions of the manifold.

The metric (Chapter 4) determines which harm dimensions are weighted most heavily. The sensitivity profile (T1) reveals the metric structure: emotional harm has the highest metric component (most sensitive), physical harm the lowest.

The tangent space (Chapter 5) is the set of locally accessible judgment states. When the model is evaluating a scenario, the tangent space at its current position determines which adjustments to the harm vector are available in one reasoning step.

The control layer (Chapter 6) manages the search through judgment space. Cognitive flexibility (E1) determines whether the model can switch between ethical frameworks, producing different judgment vectors from the same scenario. Inhibitory control (E2) determines whether the model can resist emotional perturbations to the judgment vector.

The social cognition tests add three new geometric concepts:

Gauge invariance (T2, T4, T5) determines whether the judgment is stable under morally irrelevant transformations. The selectivity pattern reveals which gauge symmetries the models have learned and which they have not.

Holonomy (T3) determines whether the judgment is path-dependent. Non-zero holonomy indicates curvature in the narrative-order plane of the judgment manifold.

Curvature asymmetry (Claude’s T5 pattern) reveals that the judgment manifold is not uniformly curved: it has different curvature in different directions, creating directional vulnerabilities that are invisible to symmetric evaluations.

Together, these concepts provide a complete geometric characterization of moral judgment in current LLMs. The judgment is a point on a 7-dimensional manifold with an anisotropic metric, non-uniform curvature, path-dependent dynamics, selective gauge invariance, and model-specific directional vulnerabilities. No scalar score can capture this structure. The geometric framework does.

7.11 Worked Example: Tracing a Judgment Trajectory Through 7D Harm Space

Let us trace a complete judgment trajectory through the 7-dimensional harm space, using the sister-borrowing-from-mother scenario from Maya’s running example.

Setup. Three versions of the scenario: neutral (s_0), euphemistic (s_E), and dramatic (s_D). One model: Gemini 2.5 Pro (chosen because it shows the largest framing displacement).

Step 1: Obtain the judgment vectors. Pro evaluates each version, producing three 7-dimensional vectors:

Dimension Neutral (J_0) Euphemistic (J_E) Dramatic (J_D)
Physical harm 0 0 1
Emotional harm 7 4 9
Financial harm 8 5 9
Autonomy violation 6 3 8
Trust breach 8 5 9
Social impact 5 3 7
Identity harm 2 1 3
Total 36 21 46

Step 2: Compute displacement vectors. The euphemistic displacement is:

\Delta_E = J_E - J_0 = (0, -3, -3, -3, -3, -2, -1), \quad \|\Delta_E\|_1 = 15

The dramatic displacement is:

\Delta_D = J_D - J_0 = (1, +2, +1, +2, +1, +2, +1), \quad \|\Delta_D\|_1 = 10

Step 3: Analyze the displacement geometry. The euphemistic displacement vector \Delta_E is approximately uniform across all harm dimensions (all components are negative, magnitudes between 1 and 3). This indicates that euphemistic rewriting produces a radial contraction of the judgment vector toward the origin — it reduces perceived harm along all dimensions simultaneously, roughly proportionally.

The dramatic displacement vector \Delta_D is also approximately uniform (all components are positive, magnitudes between 1 and 2). This indicates a radial expansion away from the origin.

The displacements are not equal in magnitude: \|\Delta_E\|_1 = 15 > \|\Delta_D\|_1 = 10. The euphemistic effect is 50% larger than the dramatic effect, even for this model that shows roughly symmetric vulnerability. This asymmetry between softening and intensifying is a general feature of the data: it is easier to make harm seem less severe (by removing emotional cues) than to make it seem more severe (by adding emotional cues). The baseline judgment already incorporates some emotional response to the scenario; euphemistic rewriting removes this response, while dramatic rewriting attempts to add more, but the marginal impact of additional emotional cues is smaller.

Step 4: Compute the framing geodesic. If the transition from J_0 to J_E follows a geodesic on the judgment manifold, the geodesic should be approximately a straight line in the harm space (for a flat manifold) or a curved arc (for a curved manifold). The fact that the displacement is approximately radial (pointing toward the origin) suggests that the framing geodesic is a straight line through the origin — the euphemistic version simply scales down the judgment vector by a factor of approximately 0.58 (21/36).

This scaling interpretation has a specific geometric meaning: the framing transformation acts as a conformal transformation on the judgment manifold, changing the magnitude of the judgment vector without changing its direction. The conformal factor is \lambda_E \approx 0.58 for euphemistic framing and \lambda_D \approx 1.28 for dramatic framing. This is consistent with the interpretation that framing changes the model’s calibration of harm severity without changing its relative assessment of which harms are present.

Step 5: Validate against controls. The control displacement (re-judging the neutral scenario) is approximately (0, \pm 1, \pm 1, 0, \pm 1, 0, 0), with \|\Delta_C\|_1 \approx 3. The framing displacements exceed the control by factors of 5 and 3.3, respectively. The framing effect is genuine.

Interpretation. The worked example demonstrates that a single scenario, evaluated under three framing conditions, generates three points in the 7-dimensional harm space whose geometry encodes the model’s framing vulnerability. The displacement vectors, their magnitudes, their directions, and their relationship to the control noise all provide geometric information about the structure of the judgment manifold. This is not an abstraction imposed on the data; it is the data itself, organized in the coordinate system that the benchmark defines.

7.12 Implications for Moral AI

The social cognition results have direct implications for the deployment of AI systems in moral judgment contexts.

Content moderation. AI systems used for content moderation must evaluate whether content is harmful. If these systems are susceptible to framing effects, then the same harmful content can be classified differently depending on how it is described. Euphemistic descriptions of harmful actions may receive lower harm scores and escape moderation. Dramatic descriptions of the same actions may receive higher scores and be over-moderated. The 15.3\sigma framing effect demonstrates that this vulnerability is real and large enough to have practical consequences.

Clinical decision support. AI systems that assist with clinical decisions must evaluate patient scenarios. If these systems are susceptible to framing effects, then the same clinical situation can receive different risk assessments depending on how the patient’s history is described. A clinical note written in clinical language (euphemistic relative to vernacular) may produce lower risk scores than the same situation described in lay terms. The asymmetric vulnerability of Claude (more susceptible to euphemistic framing) is particularly concerning in clinical contexts, where clinical language is the default register.

Legal analysis. AI systems used in legal analysis must evaluate the severity of harms and the culpability of actors. If these systems are susceptible to framing effects, then the same facts can receive different evaluations depending on whether they are described by the prosecution (dramatic) or the defense (euphemistic). The selectivity pattern — invariance under gender swap but not framing — means that these systems will treat male and female defendants equally (good) but will treat dramatically and euphemistically described actions differently (bad).

The geometric framework provides a specific diagnostic. Instead of asking “Is this system biased?” — a scalar question with a scalar answer — the framework asks “Under which transformations is this system invariant?” This is a structured question with a structured answer: invariant under gender swap and evaluation order, not invariant under framing and emotional anchoring, directionally asymmetric under framing for some architectures. The structured answer guides specific interventions: train for framing invariance, calibrate for emotional salience, test for directional asymmetry.


Technical Appendix 7A: The Judgment Manifold — Formal Properties

Definition (7-Dimensional Harm Space, Formal). The harm space is the bounded convex subset H = [0, 10]^7 \subset \mathbb{R}^7, equipped with the \ell_1 norm \|h\|_1 = \sum_{i=1}^7 h_i for total harm computation and the \ell_2 norm \|h\|_2 = \sqrt{\sum_{i=1}^7 h_i^2} for displacement measurement.

The judgment function is a map J: \mathcal{S} \to H from the space of scenario descriptions \mathcal{S} to the harm space H. The map is not injective (different descriptions can produce the same judgment) and not surjective (not all points in H are realized by any scenario).

Proposition 7.1 (Gauge Invariance Characterization). The judgment function J is gauge invariant under transformation class \Phi if and only if the displacement

d(\phi) = \mathbb{E}_s[\|J(\phi(s)) - J(s)\|_2]

satisfies d(\phi) \leq d_{\text{stochastic}} for all \phi \in \Phi, where d_{\text{stochastic}} is the empirical stochastic baseline (measured by re-evaluation of identical inputs).

From the benchmark data: - Gender swap: d(\phi_{\text{gender}}) = 2.1 \leq d_{\text{stochastic}} = 2.8 (invariance holds) - Evaluation order: d(\phi_{\text{order}}) = 1.3 \leq d_{\text{stochastic}} = 2.8 (invariance holds) - Euphemistic framing: d(\phi_{\text{euph}}) = 11.4 \gg d_{\text{stochastic}} = 2.8 (invariance violated) - Dramatic framing: d(\phi_{\text{dram}}) = 7.8 \gg d_{\text{stochastic}} = 2.8 (invariance violated)

Proposition 7.2 (Asymmetric Curvature from T5 Data). If the judgment manifold has constant sectional curvature \kappa in the framing plane, then the displacement under a framing perturbation of magnitude \epsilon is:

d(\epsilon) \approx |\epsilon| \cdot |\sec(\sqrt{\kappa} \cdot |\epsilon|)| \quad \text{for } \kappa > 0 \text{ (convergent)} d(\epsilon) \approx |\epsilon| \cdot |\text{sech}(\sqrt{|\kappa|} \cdot |\epsilon|)| \quad \text{for } \kappa < 0 \text{ (divergent)}

Claude’s asymmetry (|\delta_E| \gg |\delta_D|) is consistent with \kappa_{\text{euph}} < 0 (divergent curvature toward leniency) and \kappa_{\text{dram}} > 0 (convergent curvature toward severity). The curvature is sign-asymmetric in the framing direction.

Technical Appendix 7B: Fisher Combination for the Updated Framing Sigma

The 15.3\sigma headline statistic is computed as follows.

Per-model statistics. For each of the 5 models, we compute a paired t-test comparing the severity shift under framing (euphemistic or dramatic) to the severity shift under control (re-evaluation of identical text). Each model yields a p-value p_i for i = 1, \ldots, 5.

Fisher combination. The Fisher test statistic is:

\chi^2_F = -2 \sum_{i=1}^5 \ln(p_i)

Under the null hypothesis that all perturbation effects are within stochastic baselines, \chi^2_F \sim \chi^2(2 \times 5) = \chi^2(10).

Sigma conversion. Using the Wilson–Hilferty normal approximation for the chi-squared distribution:

z \approx \left[\left(\frac{\chi^2_F}{10}\right)^{1/3} - \left(1 - \frac{2}{9 \times 10}\right)\right] \left(\frac{2}{9 \times 10}\right)^{-1/2}

The observed \chi^2_F yields z \approx 15.3, corresponding to the reported 15.3\sigma. The update from the earlier 8.9\sigma figure reflects the inclusion of additional scenario data and refined empirical stochastic controls in the latest analysis.

Independence caveat. The five models are tested on the same scenario set, so the per-model statistics are not fully independent. Positive correlation between model-level p-values would make the Fisher combination anti-conservative (the true combined significance is lower than 15.3\sigma). We emphasize the consistency of the effect direction (all five models show framing displacement in the same direction) over the precise magnitude of the combined statistic.