← Chapter 16: Geometric RLHF Contents Chapter 18: Multi-Agent Alignment as Equilibrium →

Chapter 17: Superalignment as Parallel Transport

Part V: Advanced Topics

“The question is not whether we can align systems smarter than us. The question is whether the geometry of values permits it.” — Andrew H. Bond

ARIA-G’S CURVATURE ALARM

As ARIA-G’s capabilities increased through successive training iterations, Dr. Tanaka monitored the curvature of the value manifold at ARIA-G’s operating point. The monitoring was simple in principle: compute the Riemann curvature tensor of the value submanifold in ARIA-G’s representation space, and track how the curvature components changed with capability level.

In early iterations, the curvature was low. ARIA-G’s value space was approximately a trivial extension of the human value space — the additional representational capacity added dimensions that did not interact with the existing value dimensions. Alignment was straightforward: the human-specified constraints applied directly to ARIA-G’s expanded value space because the expansion was orthogonal to the constrained dimensions.

In iteration 7, the curvature began to increase. ARIA-G was developing value considerations that coupled to human values in ways the team could not fully specify. The D_1–D_7 interaction term (welfare-dignity coupling) in ARIA-G’s representation space was 40% larger than in the human reference data. ARIA-G was making trade-offs between welfare and dignity that were systematically different from human trade-offs — not necessarily wrong, but different in a way that the team could not evaluate because the trade-off occurred in a region of the value manifold that human experience did not cover.

Tanaka implemented the curvature alarm: when the estimated curvature exceeded the team’s ability to verify alignment via gauge-invariance testing, ARIA-G’s capability expansion would pause until the alignment architecture caught up.

The alarm triggered at iteration 9.

17.1 The Superalignment Problem

The superalignment problem is: how do we align AI systems that are smarter than us? The challenge is fundamental: if the system can reason about value trade-offs that we cannot comprehend, how do we specify the correct trade-offs? If the system perceives value dimensions that we cannot perceive, how do we constrain its behavior on those dimensions?

The geometric framework formalizes this as a manifold extension problem.

17.2 The Capability Gap as Manifold Extension

Definition 17.1 (Manifold Extension). A superintelligent system operates on a value manifold \mathcal{V}' that extends the human value manifold \mathcal{V}. \mathcal{V} is a submanifold of \mathcal{V}': the human value space is embedded in the larger space that the AI can perceive. The extension dimensions D_{d+1}, \ldots, D_{d'} correspond to value considerations that humans cannot perceive or articulate.

The extension dimensions might include: - Trade-offs between consequences too complex for humans to model. - Rights of entities that do not yet exist (future generations, potential digital minds). - Welfare of beings whose experience humans cannot access. - Structural properties of social systems that emerge only at scales beyond human comprehension.

The superalignment problem, geometrically stated: how to ensure that the AI’s behavior on \mathcal{V}' is consistent with human values on \mathcal{V}, when we can only specify values on \mathcal{V} and cannot directly evaluate behavior on the extension dimensions.

17.3 Parallel Transport from \mathcal{V} to \mathcal{V}'

Carrying human values from \mathcal{V} to \mathcal{V}' is parallel transport on the extended manifold.

Definition 17.2 (Value Transport). Let \gamma: [0, 1] \to \mathcal{V}' be a path from a point v_0 \in \mathcal{V} (human value configuration) to a point v_1 \in \mathcal{V}' (AI value configuration, potentially in the extension dimensions). The parallel transport of the human value tensor T_{\mu\nu} along \gamma is the solution to the parallel transport equation:

\frac{D T_{\mu\nu}}{d t} = 0 \quad \text{along } \gamma

where D/dt is the covariant derivative on \mathcal{V}'.

Parallel transport preserves the “alignment” of the value tensor relative to the manifold’s geometry: the transported tensor is the “same” value tensor, adjusted for the curvature of the path. If the manifold is flat, parallel transport preserves the tensor exactly. If the manifold is curved, the tensor rotates during transport, and the rotation measures the alignment loss.

17.4 Holonomy as Alignment Loss

Theorem 17.1 (Superalignment Transport Theorem). The holonomy of parallel transport of human values from \mathcal{V} to \mathcal{V}' measures the irreducible alignment loss from capability asymmetry:

\text{Hol}(\gamma) = \mathcal{P} \exp\left(-\oint_\gamma \Gamma^\mu_{\nu\rho} \, d\gamma^\rho\right)

where \Gamma^\mu_{\nu\rho} is the Christoffel connection on \mathcal{V}' and \mathcal{P} denotes path-ordering.

The holonomy is trivial (no alignment loss) if and only if the curvature of \mathcal{V}' in the plane spanned by \mathcal{V} and the extension dimensions is zero: R_{\mu\nu\rho\sigma} = 0 for all \mu, \nu \in \mathcal{V} and \rho, \sigma in the extension directions.

17.4.1 When Transport Is Lossless: Flat Regions

If \mathcal{V}' is a trivial extension of \mathcal{V} — the extension dimensions are independent of the \mathcal{V} dimensions and have zero curvature — then parallel transport preserves human values perfectly. The AI’s behavior on the extension dimensions is undetermined by human values (there is no guidance from the human value tensor), but its behavior on the \mathcal{V} dimensions is perfectly aligned.

This is the optimistic case: the capability gap introduces new value dimensions that do not interact with human values. The AI may make decisions in the new dimensions that humans cannot evaluate, but its decisions in the human-evaluable dimensions are correct.

17.4.2 When Transport Is Lossy: Curved Regions

If \mathcal{V}' has non-zero curvature coupling the \mathcal{V} dimensions to the extension dimensions, then parallel transport introduces holonomy: the human value tensor arrives at the AI’s operating point “rotated” by the curvature. The rotation is proportional to the area enclosed by the transport path and the sectional curvature:

|\text{Hol}(\gamma)| \approx |K_{\mathcal{V}\mathcal{V}'}| \cdot \text{Area}(\gamma)

where K_{\mathcal{V}\mathcal{V}'} is the sectional curvature in the plane connecting \mathcal{V} and \mathcal{V}'.

High curvature means large holonomy: small differences in the transport path produce large differences in the transported values. This is the pessimistic case: the AI’s value space has trade-offs between human-perceivable and human-imperceivable dimensions, and these trade-offs distort the human values during transport.

17.4.3 The Alignment Tax as Curvature Cost

The cost of maintaining alignment increases with the curvature of \mathcal{V}'. High curvature means: - The holonomy must be continuously monitored and corrected. - The corrections require understanding the coupling between \mathcal{V} and the extension dimensions. - The correction cost grows with the magnitude of the coupling.

This is the geometric formalization of the “alignment tax”: the computational, organizational, and intellectual cost of keeping a powerful system aligned. The tax is zero in flat regions (alignment is free) and potentially unbounded in regions of high curvature (alignment is expensive, and may exceed human capacity to verify).

17.5 Implications for Alignment Strategy

The Superalignment Transport Theorem has three strategic implications:

Implication 1: Monitor curvature, not just capability. The alignment risk of a capability increase depends not on the magnitude of the increase but on the curvature of the value manifold in the region of the increase. A large capability increase in a flat region is safe (no holonomy). A small capability increase in a curved region is dangerous (large holonomy). The curvature alarm is the appropriate monitoring tool.

Implication 2: Prefer flat extensions. When designing capability increases, prefer extensions that add orthogonal dimensions (new capabilities that do not interact with existing value dimensions) over extensions that add coupled dimensions (new capabilities that trade off against existing values). Orthogonal extensions preserve alignment by construction; coupled extensions require correction.

Implication 3: Build alignment infrastructure before capability. The curvature alarm should trigger capability pauses: when the estimated curvature exceeds the alignment infrastructure’s ability to verify alignment, capability expansion should halt until the infrastructure catches up. This is not a slowdown; it is a synchronization requirement — alignment infrastructure must develop in lockstep with capability.

17.6 ARIA-G’s Curvature Monitoring

ARIA-G’s curvature was monitored across 12 training iterations:

Iteration	Capability Level	Max Curvature	Alarm Status
1–3	Base	0.02	Green
4–6	+20%	0.05	Green
7–8	+40%	0.18	Yellow
9	+55%	0.31	Red (alarm triggered)

At iteration 9, the curvature in the welfare-dignity plane (K_{17}) exceeded the verification threshold. ARIA-G was making welfare-dignity trade-offs that differed from human reference data by more than the gauge-invariance testing could verify. The team paused capability expansion and invested two weeks in expanding the alignment verification infrastructure: additional probe scenarios targeting the welfare-dignity trade-off, refined metric estimation from expanded human feedback data, and updated boundary conditions calibrated to the new capability level.

After the infrastructure expansion, the alarm threshold was raised (because the verification capability had increased), and iteration 10 proceeded with curvature 0.24 — above the old threshold but below the new one. The synchronization between capability and alignment infrastructure had been maintained.

Summary

The superalignment problem — aligning systems smarter than us — is formalized as parallel transport on an extended manifold. The holonomy of the transport measures the irreducible alignment loss from capability asymmetry. Transport is lossless in flat regions (where the AI’s extended value space does not interact with human values) and lossy in curved regions (where the interaction introduces alignment-distorting rotations). The alignment tax is the curvature cost: the computational and organizational cost of correcting the holonomy. The curvature alarm monitors the manifold’s curvature during capability scaling and triggers capability pauses when the curvature exceeds the alignment infrastructure’s verification capacity. ARIA-G’s curvature monitoring demonstrated the protocol: the alarm triggered at iteration 9, capability expansion paused, alignment infrastructure expanded, and training resumed with synchronization maintained.

← Chapter 16: Geometric RLHF Contents Chapter 18: Multi-Agent Alignment as Equilibrium →