Chapter 2: Saussure’s Geometric Intuition

“In language there are only differences without positive terms.” — Ferdinand de Saussure, Course in General Linguistics (1916)


RUNNING EXAMPLE — THE ROSETTA TRAJECTORY

Consider three utterances: the English sentence “The king issued a decree,” the Akkadian cuneiform sequence 𒊬𒊒𒌝 šarrum iqbi (“the king spoke”), and a sequence of sperm whale codas — five clicks in a 3+2 rhythm pattern, followed by three clicks in regular spacing, produced during a social exchange. We cannot yet translate the whale’s contribution. But we can ask a question that cuts across all three: is there a structure to meaning that survives the radical differences in surface form?

The English sentence uses an alphabet of 26 letters, an inventory of roughly 170,000 words, and the SVO word order that emerged in Middle English. The Akkadian uses a syllabary of some 600 cuneiform signs, a vocabulary organized by Semitic triconsonantal roots, and SOV word order. The whale uses a repertoire of perhaps 21 coda types, distinguished by click count and inter-click rhythm. These surface systems have nothing in common. And yet the first two convey the same propositional content — a ruler performed a speech act — and we suspect the third conveys something, because it occurs in structured social contexts with statistical regularities that mirror those of human language.

This chapter argues that the question of what survives across surface forms is a geometric question, and that the history of linguistics has been converging on this recognition for over a century. Ferdinand de Saussure came closest to saying it explicitly in 1916. He lacked only the vocabulary.


2.1 Two Insights, Restated

Ferdinand de Saussure never published his most influential work. The Cours de linguistique générale (1916) was assembled posthumously by his students Charles Bally and Albert Sechehaye from lecture notes — a circumstance that has spawned a century of scholarly debate about what Saussure actually meant.[^1] But two claims emerge from the reconstruction with sufficient clarity and consistency that their attribution is not in serious doubt. Together, they constitute the foundation of modern structural linguistics. And together, restated in the vocabulary of differential geometry, they constitute the foundation of a geometric theory of meaning.

2.1.1 The Arbitrariness of the Sign as Gauge Invariance

The first claim is the arbitrariness of the sign: the relationship between a signifier (the sound image, the written form, the gestural shape) and the signified (the concept) is not motivated by any natural resemblance or causal connection. The English word “tree” does not look like a tree, sound like a tree, or share any physical property with a tree. The French arbre, the German Baum, the Japanese 木 (ki), and the Akkadian 𒄑 (iṣu) all point to the same concept through entirely different signifiers. The choice of signifier is, in Saussure’s term, arbitraire — arbitrary.

This observation was not new with Saussure. The conventionality of names had been noted by Plato in the Cratylus, debated by the medieval nominalists, and assumed by the British empiricists. What Saussure did was elevate it from a philosophical curiosity to a structural principle. If the sign is arbitrary, then meaning cannot reside in the sign. The signifier is not a window onto the signified; it is a label, and the label could have been anything.

Now restate this in the language of modern geometry. A label that can be freely changed without altering the underlying reality is a coordinate choice. And the statement that the underlying reality is invariant under change of coordinates is gauge invariance.

The parallel is precise. In electromagnetism, the scalar and vector potentials $\phi$ and $\mathbf{A}$ are mathematical descriptions of the electromagnetic field. They can be changed by a gauge transformation $\phi \to \phi - \partial_t \Lambda$, $\mathbf{A} \to \mathbf{A} + \nabla \Lambda$ without altering the physical fields $\mathbf{E}$ and $\mathbf{B}$. The potentials are coordinates on the space of field configurations; the physical content is the gauge-invariant residue — the geometric object that survives all coordinate changes.

Saussure’s arbitrariness of the sign is the same statement, applied to language. The signifier “tree” is a coordinate choice — a particular labeling of a concept in the coordinate system called English. The French signifier arbre is a different coordinate choice for the same concept in a different coordinate system. Translation between languages is a gauge transformation: it changes every label while (ideally) preserving the meaning. And meaning itself — the thing that survives translation, paraphrase, reformulation, and change of medium — is the gauge-invariant content.

$$\text{meaning}(x) = \text{meaning}(\tau(x)) \quad \text{for all gauge transformations } \tau \in G$$

where $G$ is the communication gauge group — the set of all content-preserving transformations of surface form, including translation between languages, paraphrase within a language, and reformulation across media (spoken, written, signed).

Saussure could not have stated this in 1916. Gauge theory would not be developed until Hermann Weyl’s work in 1918 and its mature formulation by Yang and Mills in 1954. But the conceptual structure is identical: the sign is a coordinate, meaning is invariant under coordinate change, and the study of meaning is the study of what survives the transformation.

[Figure 2.1] The arbitrariness of the sign as gauge invariance. Five signifiers — English “tree,” French “arbre,” German “Baum,” Japanese 木, Akkadian 𒄑 — are five coordinate representations of the same point on the semantic manifold. Translation is a gauge transformation: it changes coordinates without moving the point. The gauge-invariant content — the concept TREE — is the geometric object; the signifiers are the coordinates.

2.1.2 Meaning from Relations as Topology

Saussure’s second insight is subtler and more consequential. It is the claim captured in the epigraph: in language there are only differences without positive terms. No sign has meaning in isolation. The meaning of “dog” is not determined by some intrinsic property of the word or some direct connection to the concept DOG. It is determined by the word’s position in a network of relations — its difference from “cat,” from “wolf,” from “pet,” from “animal,” and from every other term in the language.

Saussure illustrated this with the famous example of the 8:25 Geneva-to-Paris express. The train that departs at 8:25 today is not the same physical vehicle as the train that departed at 8:25 yesterday — different locomotive, different cars, different crew. Yet we call it “the same train” because it occupies the same position in the system of railway scheduling. Its identity is relational, not intrinsic. Similarly, the English word “sheep” and the French word mouton do not have the same value, despite referring to overlapping concepts, because English distinguishes “sheep” (animal) from “mutton” (meat) while French uses mouton for both. The value of each term depends on what the system provides around it.[^2]

This is a topological statement. Topology is the branch of mathematics that studies the properties of spaces that are preserved under continuous deformation — stretching, bending, and warping, but not tearing or gluing. Topological properties are relational: they describe how parts of a space are connected to each other, not the absolute position or size of any part. A coffee cup and a donut are topologically equivalent (both have one hole); a coffee cup and a soup bowl are not (the bowl has no hole). What matters is the pattern of connectivity, not the metric details.

When Saussure says that meaning arises from relations and not from individual signs, he is saying that the semantic space has topological structure — a pattern of neighborhoods, connections, and separations that determines the value of each position. The specific labels assigned to positions (the signifiers) are coordinates that can be changed without altering the topology. What is invariant — what constitutes meaning — is the topology itself: the structure of differences, oppositions, and gradations that organizes the semantic space.

This is a deeper claim than the arbitrariness of the sign. Arbitrariness says that labels can be changed freely. Topological meaning says that the thing being labeled is not a collection of independent points but a structured space whose shape carries information. The meaning of any sign is its address in that space — and an address is nothing without the space it addresses.

$$\text{value}(\text{sign}_i) = f\left(\{d(\text{sign}_i, \text{sign}_j) : j \neq i\}\right)$$

The value of a sign is a function of its distances to all other signs in the system. Change the distance structure — add a sign, remove a sign, shift the boundaries between categories — and you change the value of every sign in the system. This is why languages are not nomenclatures. A nomenclature is a list of labels for pre-existing categories. A language is a system whose categories are defined by the labels’ mutual relations.

[Figure 2.2] Meaning as topology. Left: the English color space, with “blue” and “green” as distinct regions. Right: the Russian color space, where голубой (goluboy, light blue) and синий (siniy, dark blue) split what English treats as a single region. The topology of the color space differs across languages; the signifiers partition the space differently. The geometric content — the continuous space of visible wavelengths — is invariant, but the coordinate charts (the linguistic categories) are not. Meaning is the structure of the partition, not the labels on the parts.


2.2 The Geometric Thread in the History of Linguistics

Saussure did not develop either insight into a formal geometric theory. He died in 1913, three years before the Cours was published, and his work was suggestive rather than axiomatic. But the geometric thread he introduced — the idea that linguistic structure is a matter of relations, symmetries, and invariances rather than individual forms — was picked up and extended by subsequent generations, each of which came closer to explicit geometry without quite naming it as such.

What follows is not a comprehensive history of linguistics. It is a selective tracing of one thread: the geometric intuition that runs from Saussure through Jakobson, Chomsky, Montague, and the distributional semanticists to the word embedding models of the present day. The thesis is simple: each generation was doing geometry. The book gives them the vocabulary they lacked.

2.2.1 Jakobson’s Distinctive Features: A Discrete Symmetry Group

Roman Jakobson, working first in the Prague Linguistic Circle in the 1930s and later at Harvard and MIT, transformed Saussure’s relational principle into a precise analytical tool. His contribution, developed with Gunnar Fant and Morris Halle in Preliminaries to Speech Analysis (1952), was the theory of distinctive features: the claim that the phonological systems of all human languages can be analyzed in terms of a small set of binary oppositions.[^3]

The insight was that phonemes — the minimal sound units that distinguish meaning — are not atomic. They are bundles of features, each feature representing a binary choice: voiced or voiceless, nasal or oral, compact or diffuse, grave or acute. The phoneme /b/ in English is [+voiced, −nasal, +compact, +grave]; /p/ is the same bundle with one feature flipped: [−voiced, −nasal, +compact, +grave]. The minimal pair “bat”/”pat” is distinguished by exactly one feature, and that single-feature difference is what carries the meaning distinction.

Restate this geometrically. Each binary feature defines a dimension with two values: +1 and −1. A system of $n$ distinctive features defines a discrete space with $2^n$ possible positions, of which only some are occupied by actual phonemes in any given language. The phoneme inventory of a language is a subset of vertices of the $n$-dimensional hypercube $\{-1, +1\}^n$.

Flipping a single feature is a reflection in one dimension — a specific element of the symmetry group of the hypercube. The minimal pairs that linguists use to identify phonemes are precisely the pairs of vertices connected by a single reflection. The phonological system of a language is characterized not by a list of its phonemes (the coordinates) but by which reflections produce meaning distinctions (the symmetry structure).

This is a discrete symmetry group acting on a discrete space. The group of all possible single-feature flips is $(\mathbb{Z}/2\mathbb{Z})^n$ — the direct product of $n$ copies of the cyclic group of order 2. It acts on the hypercube by reflections, and the phonological structure of a language is determined by which orbits of this group action are linguistically relevant.

Jakobson did not use group-theoretic language. He spoke of “binary oppositions” and “distinctive features” and “the pattern of contrasts.” But the mathematical structure is unmistakable: he had discovered that phonological space has the symmetry of a discrete group, and that the inventory of contrasts — the topology of the phonological system — is what carries linguistic information.

[Figure 2.3] Jakobson’s distinctive features as a discrete symmetry group. The phonological space for three binary features (voiced/voiceless, nasal/oral, stop/fricative) is the 3-cube $\{-1, +1\}^3$. English phonemes occupy specific vertices. Minimal pairs (e.g., /b/ vs. /p/) are connected by single-feature reflections — elements of the symmetry group $(\mathbb{Z}/2\mathbb{Z})^3$. The phonological structure is the pattern of occupied vertices and meaning-distinguishing reflections, not the phonetic substance of any individual sound.

2.2.2 Chomsky’s Deep Structure: A Quotient Manifold

Noam Chomsky’s revolution, beginning with Syntactic Structures (1957) and reaching its most articulated form in Aspects of the Theory of Syntax (1965), shifted the focus from the sound system to the generative machinery of grammar. Chomsky’s central claim was that the surface forms of sentences — the strings of words we actually produce and hear — are derived from underlying deep structures by a series of syntactic transformations. The passive sentence “The decree was issued by the king” and the active sentence “The king issued the decree” have different surface forms but the same deep structure: a single underlying representation from which both can be derived.

The geometric content of this claim is striking. If surface forms are related to deep structures by transformations, then the space of surface forms has the structure of a quotient space: the deep structure is the equivalence class of all surface forms obtainable from it by syntactic transformations.

More precisely, let $\mathcal{S}$ be the space of all grammatical surface forms and let $T$ be the group of syntactic transformations (passivization, topicalization, clefting, extraposition, and so on). Two surface forms $s_1, s_2 \in \mathcal{S}$ are equivalent if $s_1 = t(s_2)$ for some $t \in T$. The equivalence classes $[s] = \{t(s) : t \in T\}$ are the deep structures, and the quotient space $\mathcal{S}/T$ is the space of meanings — the space of propositional contents modulo syntactic reformulation.

In the language of differential geometry, this is a quotient manifold: a manifold obtained by identifying points related by a group action. The projection map $\pi: \mathcal{S} \to \mathcal{S}/T$ sends each surface form to its deep structure. Fibers of the projection — the preimages $\pi^{-1}([s])$ — are the sets of paraphrases: all the surface forms that express the same deep content.

Chomsky’s subsequent theoretical evolution — from the Standard Theory through Government and Binding to the Minimalist Program — can be read as a progressive attempt to simplify the transformation group $T$, reducing the number of generators required. The Minimalist Program’s single operation Merge is, in geometric terms, the conjecture that the transformation group has a single generator: that all syntactic structure can be built from one operation, recursively applied. Whether this conjecture is correct is an empirical question about the structure of the quotient. But the form of the conjecture — that complex surface variation arises from simple underlying structure via group action — is a quintessentially geometric idea.

Chomsky, like Saussure and Jakobson before him, did not frame his work in geometric terms. He used the language of formal grammars, rewrite rules, and tree structures. But the geometric skeleton is visible: surface forms are coordinates, transformations are group elements, deep structure is the quotient, and meaning is what the quotient preserves.

[Figure 2.4] Chomsky’s deep structure as a quotient manifold. The space of surface forms $\mathcal{S}$ is fibered by the transformation group $T$. Each fiber (vertical) contains all surface realizations of a single deep structure: “The king issued a decree,” “A decree was issued by the king,” “It was the king who issued a decree,” etc. The quotient space $\mathcal{S}/T$ (horizontal) is the space of deep structures — the space of meanings. The projection $\pi$ maps surface forms to their underlying content.

2.2.3 Montague Grammar: Coordinate-Free Semantics

Richard Montague’s work in the late 1960s and early 1970s — tragically cut short by his murder in 1971 — represents perhaps the most explicitly mathematical approach to natural language semantics before the advent of computational methods. Montague’s program, crystallized in “The Proper Treatment of Quantification in Ordinary English” (1973), was to provide a rigorous model-theoretic semantics for natural language: a formal system that assigns truth conditions to sentences by composing the denotations of their parts.[^4]

Montague’s key move was to separate the syntax of natural language (the rules for building well-formed expressions) from the semantics (the rules for interpreting those expressions in a model). The syntax provides a structural description of each expression; the semantics provides a homomorphism from the syntactic algebra to the semantic algebra. Meaning, in Montague’s framework, is not a property of words or sentences but a mapping — a function from possible worlds to truth values, or from contexts to extensions.

The geometric significance of this approach is that it provides coordinate-free semantics. The meaning of a sentence is not defined relative to any particular language, notation, or formalism. It is defined as an element of an algebraic structure (typically a type-theoretic hierarchy of functions) that is invariant under the choice of syntactic representation. Different languages, different formalisms, different notations can all serve as “coordinates” for the same semantic object, just as different coordinate systems can describe the same point on a manifold.

This is gauge invariance made algebraic. The syntactic algebra is the coordinate system; the semantic algebra is the geometric object; the homomorphism from syntax to semantics is the chart map. Different languages provide different charts, but the semantic content — the image in the semantic algebra — is invariant.

Montague himself was influenced by Tarski’s model-theoretic semantics and by the Fregean tradition of compositionality. He would not have described his work as geometric. But the structural content is clear: he was constructing coordinate-free representations of meaning, defined by their algebraic properties rather than by any particular linguistic realization. In the terminology of this book, he was working in the gauge-invariant sector of the communication manifold.

2.2.4 Distributional Semantics: Empirical Manifold Learning

While the formalist tradition from Chomsky through Montague pursued meaning through logical structure, a parallel tradition pursued it through statistical regularities in language use. The distributional hypothesis, articulated by J.R. Firth in 1957 — “You shall know a word by the company it keeps” — and developed more formally by Zellig Harris in the same decade, proposes that the meaning of a word is determined by the contexts in which it appears.[^5]

This is Saussure’s relational principle operationalized. If meaning is constituted by a word’s relations to other words, and if those relations are reflected in patterns of co-occurrence, then one can measure meaning by observing which words appear in similar contexts. Two words that consistently appear in the same linguistic environments — “physician” and “doctor,” for instance — have similar meanings, precisely because they occupy similar positions in the network of distributional relations.

The distributional hypothesis leads naturally to a spatial representation. If each word is characterized by its pattern of co-occurrences, then each word can be represented as a vector in a high-dimensional space where each dimension corresponds to a context (another word, a document, a syntactic frame). Words with similar meanings are nearby in this space; words with different meanings are far apart. The distance between word vectors is a measure of semantic difference.

What Firth and Harris were doing, without the terminology, was manifold learning. They were proposing that the high-dimensional space of all possible co-occurrence patterns contains a lower-dimensional structure — the semantic manifold — and that this structure can be discovered empirically by observing how words actually distribute across contexts. The co-occurrence matrix is the raw data; the semantic manifold is the structure it reveals.

The statistical techniques for extracting this structure developed over decades. Latent Semantic Analysis (Deerwester et al., 1990) applied singular value decomposition to the term-document matrix — in geometric terms, a linear approximation to the semantic manifold, a projection onto the tangent space at a point.[^6] Later methods introduced nonlinearity: Latent Dirichlet Allocation (Blei, Ng, and Jordan, 2003) modeled documents as mixtures on a simplex; spectral methods (Belkin and Niyogi, 2003) explicitly aimed to recover manifold structure via Laplacian eigenmaps. Each method was a step toward the same geometric goal: recovering the shape of the semantic manifold from empirical observations. The vocabulary was that of statistics and linear algebra, but the underlying program was geometric.

2.2.5 Word2vec and Modern Embeddings: Explicit Coordinates

The transition from implicit to explicit geometry occurred in 2013, when Tomas Mikolov and his colleagues at Google introduced Word2vec — a neural network that learns dense, low-dimensional vector representations of words from large text corpora.[^7] Word2vec was not the first word embedding method; Bengio et al. (2003) had trained neural language models that produced word vectors a decade earlier. But Word2vec’s combination of scalability, simplicity, and striking algebraic properties made the geometric structure of word meaning impossible to ignore.

The most celebrated property was the vector arithmetic of analogy. In Word2vec’s learned space, the relationship between “king” and “queen” is approximately the same vector as the relationship between “man” and “woman”:

$$\vec{\text{king}} - \vec{\text{man}} + \vec{\text{woman}} \approx \vec{\text{queen}}$$

This is a geometric statement: the semantic relation “male → female” defines a direction in the embedding space, and this direction is approximately constant across different word pairs that instantiate the same relation. The relation is not a property of any individual word; it is a property of the space — a direction on the semantic manifold.

Subsequent developments made the geometric structure richer. GloVe (Pennington, Socher, and Manning, 2014) derived embeddings from a global co-occurrence matrix, making the connection to distributional semantics explicit.[^8] ELMo (Peters et al., 2018) produced context-dependent embeddings — different vectors for the same word in different contexts — effectively parameterizing the semantic manifold by context as well as word identity. BERT (Devlin et al., 2019) and GPT (Radford et al., 2018, 2019) extended this to full-sentence representations in high-dimensional spaces where the geometry of meaning is encoded in the attention patterns and residual stream trajectories.

The progression from Word2vec to modern transformer representations is, in geometric terms, a progression from flat to curved. Word2vec’s embedding space is Euclidean — a flat manifold where distances are computed by the standard inner product. But semantic space is not flat. Hierarchical relations (hypernymy: “animal” → “dog” → “poodle”) are poorly served by Euclidean geometry, which lacks the exponential volume growth needed to embed trees without distortion. Nickel and Kiela (2017) demonstrated that hyperbolic embeddings — representations in the Poincaré ball model of hyperbolic geometry — embed hierarchical semantic structure with far less distortion than their Euclidean counterparts.[^9] This result connects directly to the hyperbolic attention mechanism we develop for cuneiform translation in Chapter 9: the sign taxonomy of the Akkadian writing system embeds naturally in hyperbolic space because it is hierarchical, and injecting that embedding into the transformer’s attention pattern improves translation quality on rare signs.

[Figure 2.5] The progression from distributional semantics to explicit geometry. Left: the co-occurrence matrix (Firth/Harris), a high-dimensional implicit representation. Center: the Word2vec embedding, explicit coordinates in a flat (Euclidean) space. Right: the Poincaré ball embedding (Nickel and Kiela, 2017), explicit coordinates in a curved (hyperbolic) space. Each step makes the geometric structure more explicit and more faithful to the actual topology of semantic relations.


2.3 The Pattern: Geometry Without the Name

The historical trajectory traced in the previous section reveals a pattern. Each generation of linguists and language scientists independently discovered a geometric structure in language — and each described it in the vocabulary of its own disciplinary tradition, without recognizing the common geometric core.

Saussure discovered gauge invariance (the sign is arbitrary) and topology (meaning is relational), but described them in the vocabulary of semiotics and structural anthropology.

Jakobson discovered discrete symmetry groups (distinctive features as binary oppositions acting on phonological space), but described them in the vocabulary of Prague School structuralism.

Chomsky discovered quotient manifolds (deep structure as equivalence classes of surface forms under syntactic transformations), but described them in the vocabulary of formal grammar and generative theory.

Montague discovered coordinate-free semantics (model-theoretic meaning as an algebraic object invariant under syntactic representation), but described it in the vocabulary of intensional logic and type theory.

Firth and Harris discovered empirical manifold learning (distributional meaning as structure in co-occurrence space), but described it in the vocabulary of corpus linguistics and structural analysis.

Mikolov et al. discovered explicit manifold coordinates (word embeddings as vectors on the semantic manifold), but described them in the vocabulary of neural networks and optimization.

Nickel and Kiela discovered that the semantic manifold has non-Euclidean curvature (hyperbolic embeddings for hierarchical meaning), but described it in the vocabulary of representation learning.

Each step was a step toward the same destination: the recognition that meaning is a geometric object — an element of a structured space with metric, topology, and symmetry — and that the surface forms of language are coordinates on that space. The coordinates are language-specific. The geometric object is language-invariant. Translation is a change of coordinates. Paraphrase is a gauge transformation. Understanding is the recovery of the geometric object from its coordinate representation.

The convergence is not coincidental. It reflects a genuine structural fact about language: meaning has geometry, and any sufficiently careful study of meaning will rediscover that geometry, whatever vocabulary the study employs. The contribution of this book is not to discover the geometry — Saussure, Jakobson, Chomsky, Montague, Firth, and their successors discovered it. The contribution is to name it, unify it, and extend it with the tools of modern differential geometry, algebraic topology, and gauge theory.


2.4 Coordinates vs. Geometric Objects

The central distinction that emerges from this history is the distinction between coordinates and geometric objects. It is worth making this distinction sharp, because it organizes everything that follows in the book.

2.4.1 What Is a Coordinate?

In mathematics, a coordinate system is a labeling scheme that assigns numbers (or symbols) to points in a space. Different coordinate systems can label the same space: Cartesian coordinates $(x, y)$ and polar coordinates $(r, \theta)$ describe the same Euclidean plane. The coordinates change; the plane does not.

In language, the following are all coordinates:

In every case, the coordinate is a choice — a convention adopted by a community, a tradition, or an individual. The choice is necessary (you cannot communicate without choosing some surface form) but not intrinsic to the meaning being communicated.

2.4.2 What Is a Geometric Object?

A geometric object is something that exists independently of any coordinate system. It can be expressed in coordinates, but its identity does not depend on any particular expression. A vector is a geometric object: it has magnitude and direction regardless of whether you describe it in Cartesian, polar, or cylindrical coordinates. The coordinates change when you rotate the reference frame; the vector does not.

In language, the following are geometric objects:

2.4.3 The Discipline’s Central Confusion

Much of the history of linguistics can be reread as a struggle to separate coordinates from geometric objects — a struggle made difficult by the fact that we can only access geometric objects through coordinates.

The Sapir-Whorf hypothesis, in its strong form, is the claim that coordinates are the geometric objects — that the structure of a language determines the structure of thought, so that speakers of different languages literally inhabit different semantic spaces. This is the claim that gauge invariance fails: that translation is not merely a change of coordinates but a change of world.

The strong Sapir-Whorf hypothesis is almost certainly false. The empirical evidence for cross-lingual invariance of core semantic structure — from Berlin and Kay’s (1969) work on color categories, through Greenberg’s universals of word order, to the BIP experiments reported in Chapter 11 of this book, which demonstrate 44.5% F1 cross-lingual transfer of moral classification (chance = 25%, $p < 10^{-50}$) — consistently shows that deep semantic structure survives translation.[^10] The coordinates change; the geometric content, at least in its coarse topology, persists.

The weak Sapir-Whorf hypothesis is more interesting geometrically. It says that the choice of coordinates affects cognition at the margins — that the particular partition a language imposes on the semantic manifold makes some distinctions easier to notice and others harder. This is not a denial of gauge invariance; it is a statement about the computational cost of coordinate change. A Russian speaker who has lexicalized the light-blue/dark-blue distinction (голубой vs. синий) can access that distinction more quickly than an English speaker who must compute it on the fly. The geometric object (the continuous color space) is the same; the coordinates (the lexical partition) affect the efficiency of access. In the framework of this book, this is a statement about the heuristic field: the language-specific lexicon functions as a heuristic that guides attention toward certain distinctions and away from others, without altering the underlying manifold.


2.5 The Communication Gauge Group

The distinction between coordinates and geometric objects leads directly to the specification of the communication gauge group — the group of transformations under which meaning is invariant. This group is the structural backbone of the book, and it is worth defining it carefully before proceeding to the manifold construction in Chapter 3.

2.5.1 Three Components

The communication gauge group $G$ has three principal components:

$$G = T \times R \times F$$

The translation group $T$: transformations mapping an expression in one natural language to an expression in another while preserving semantic content. The re-description group $R$: within-language transformations that change surface form while preserving content — paraphrase, syntactic transformation, register shift, synonym replacement. Chomsky’s syntactic transformations are a subgroup of $R$. The format group $F$: medium-change transformations — spoken ↔ written ↔ signed ↔ encoded (Morse, Braille, semaphore) — that change the physical channel without altering the content.

2.5.2 Meaning as Gauge-Invariant Content

Given the gauge group $G$, we can now define meaning precisely:

Definition. The meaning of a communicative act $x$ is the orbit of $x$ under $G$ — the equivalence class $[x] = \{g(x) : g \in G\}$ consisting of all surface forms related to $x$ by gauge transformations.

Equivalently, meaning is the information that is invariant under all elements of $G$. If we denote by $f: X \to Y$ a function that extracts “what was communicated” from a surface form, then $f$ is gauge-invariant if and only if $f(g(x)) = f(x)$ for all $g \in G$ and all $x \in X$.

This definition has consequences.

First, meaning is not a string, a parse tree, or a logical form. It is an equivalence class. No particular member of the class is privileged. The English expression, the French expression, and the cuneiform expression are all equally valid representatives — different coordinates for the same geometric object.

Second, the degree to which a system’s outputs respect gauge invariance is an empirical question. A perfect translator would be exactly gauge-invariant. Real translators — human and machine — are approximately gauge-invariant at best. The discrepancy is the gauge anomaly, and its measurement is one of the central empirical programs of this book.

Third, some communicative content may not be fully gauge-invariant. Poetry depends on sound; legal language depends on specific formulations; puns depend on lexical coincidence. These communicative acts have components in the gauge sector — structure that is genuine but coordinate-dependent. A complete geometric analysis must account for both the gauge-invariant content and the gauge-dependent form.


2.6 Connecting to the Communication BIP

The communication gauge group provides the foundation for the domain-specific instantiation of the Bond Invariance Principle (BIP) developed in Geometric Reasoning (Bond, 2026c, Ch. 8).

The general BIP states that the output of a reasoning system should be invariant under transformations that preserve the content of the input. The communication BIP specializes this to the communication domain:

The Communication BIP. The interpretation of a communicative act should be invariant under the communication gauge group $G = T \times R \times F$. That is: translation, paraphrase, and medium change should not alter the recovered meaning.

This is not a normative aspiration. It is a structural property that any system claiming to extract meaning from language must possess. If a translation system produces different outputs for “The king issued a decree” and “Un décret fut émis par le roi” (given that these have the same propositional content), the system is responding to coordinates, not to the geometric object. If a sentiment analysis system assigns different polarities to “He passed away peacefully” and “He died without suffering,” it is confusing the gauge sector (register, euphemism) with the invariant content (a death occurred, without pain).

The empirical results reported in Chapter 11 provide the strongest available evidence that semantic gauge invariance holds. The BIP cross-lingual transfer experiment trains a moral classifier on Hebrew and Aramaic texts spanning 500 BCE to 1800 CE and tests it on American English advice columns from 1956 to 2020. The result — 44.5% F1, with chance at 25% and $p < 10^{-50}$ — demonstrates that the relational structure of moral meaning is gauge-invariant under translation across languages, centuries, and civilizations. The coordinates are utterly different: Biblical Hebrew and mid-century American English share no vocabulary, syntax, or cultural context. But the gauge-invariant content — the topology of moral relations — survives.

For the Rosetta trajectory, the communication BIP is the organizing principle. At each step — whale codas, birdsong, cuneiform, modern language — we ask: what structure is invariant under the transformations specific to that communication system? In every case, the geometric content is what the gauge group preserves.


2.7 What Saussure Could Not Have Known

This chapter has argued that Saussure’s two foundational insights — the arbitrariness of the sign and the relational nature of meaning — are geometric claims, and that the subsequent history of linguistics has been a progressive uncovering of the geometry they imply. It is worth closing by noting what Saussure could not have known in 1916 and what the intervening century of mathematics and computation has provided.

He could not have known that gauge theory would be developed. Hermann Weyl’s first gauge-theoretic paper appeared in 1918 — two years after the Cours — and addressed a question in general relativity, not linguistics. The non-Abelian gauge theories that underpin the Standard Model were not formulated until Yang and Mills (1954). The idea that invariance under re-description is a structural principle rather than a philosophical curiosity was not available to Saussure.

He could not have known that computers would learn word embeddings. The Word2vec paper appeared 97 years after the Cours. But its results are the natural consequence of Saussure’s own principle: if meaning is relational, and if co-occurrence reflects relation, then a system that learns co-occurrence patterns will learn a representation of meaning — and that representation will have geometric structure, because relations are geometry.

He could not have known about non-Euclidean semantic space. The Poincaré ball embeddings that we use for cuneiform sign taxonomy (Chapter 9) and whale coda hierarchy (Chapter 7) require the discovery — a full century after Saussure — that hierarchical semantic relations embed more naturally in hyperbolic than Euclidean space (Nickel and Kiela, 2017). Yet this too follows from Saussure’s relational principle: if semantic categories are organized hierarchically, and if hierarchy requires exponential volume growth, then the semantic manifold must have negative curvature in the hierarchical dimensions.

What Saussure did know, and what he stated with remarkable clarity, is the principle from which all of this follows: meaning is not a property of signs. It is a property of the system of relations among signs. The signs are coordinates. The relations are geometry. And the geometry is what this book aims to make explicit.


Summary

This chapter has traced the geometric thread running through the history of linguistics, from Saussure’s founding insights to modern word embeddings. The key claims:

  1. Saussure’s arbitrariness of the sign is gauge invariance: the signifier is a coordinate choice, and meaning is the gauge-invariant content that survives coordinate change.

  2. Saussure’s relational meaning is topological structure: the value of a sign is determined by its position in a network of relations, not by any intrinsic property — this is a statement about the topology of semantic space.

  3. Jakobson’s distinctive features are a discrete symmetry group: phonological space is a hypercube, and the minimal-pair contrasts that carry linguistic information are the reflections of the group $(\mathbb{Z}/2\mathbb{Z})^n$.

  4. Chomsky’s deep structure is a quotient manifold: meaning is the equivalence class of surface forms modded out by syntactic transformations, and the space of meanings is the quotient $\mathcal{S}/T$.

  5. Montague’s model-theoretic semantics is coordinate-free meaning: the semantic algebra is a geometric object, and the syntactic algebra is a coordinate system.

  6. Distributional semantics is empirical manifold learning: Firth and Harris proposed recovering the structure of the semantic manifold from observations of word co-occurrence.

  7. Word2vec and its successors provide explicit coordinates on the semantic manifold: the progression from flat (Euclidean) to curved (hyperbolic) embeddings reflects the actual geometry of semantic space.

  8. The communication gauge group $G = T \times R \times F$ — translation, re-description, and format change — defines the transformations under which meaning is invariant. Meaning is the gauge-invariant residue.

  9. The communication BIP — the requirement that interpretation be invariant under $G$ — is the structural test for whether a system is extracting meaning or responding to coordinates.

The next chapter constructs the communication manifold explicitly: the product space $M = S \times P \times C$ on which all communicative acts live, equipped with the metric, symmetry, and heuristic field structure inherited from the parent framework of Geometric Reasoning.


[^1]: The textual situation is well documented in Bouissac (2010) and Harris (2001). Saussure’s own notes, discovered in 1996 in a Geneva orangery, confirm the broad outline of the Cours while differing on many details. See Écrits de linguistique générale (Saussure, 2002).

[^2]: The sheep/mutton example appears in the Cours (Part II, Chapter IV, Section 2). The Geneva express example appears in the same chapter. Both illustrate the principle of valeur — linguistic value as determined by the system, not by any individual term.

[^3]: Jakobson, Fant, and Halle, Preliminaries to Speech Analysis: The Distinctive Features and their Correlates (MIT Press, 1952). The system was refined over subsequent decades, culminating in the feature geometry of Clements (1985) and the underspecification theory of Archangeli (1988).

[^4]: Montague’s three foundational papers — “English as a Formal Language” (1970a), “Universal Grammar” (1970b), and “The Proper Treatment of Quantification in Ordinary English” (1973) — are collected in Thomason (1974). The program has been extensively developed by Partee, Dowty, Peters, and others.

[^5]: Firth, “A Synopsis of Linguistic Theory, 1930–1955,” in Studies in Linguistic Analysis (Blackwell, 1957), p. 11. Harris, “Distributional Structure,” Word 10 (1954), pp. 146–162.

[^6]: Deerwester, Dumais, Furnas, Landauer, and Harshman, “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science 41 (1990), pp. 391–407.

[^7]: Mikolov, Chen, Corrado, and Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv:1301.3781 (2013). The companion paper, Mikolov, Sutskever, Chen, Corrado, and Dean, “Distributed Representations of Words and Phrases and their Compositionality,” NeurIPS 2013.

[^8]: Pennington, Socher, and Manning, “GloVe: Global Vectors for Word Representation,” EMNLP 2014.

[^9]: Nickel and Kiela, “Poincaré Embeddings for Learning Hierarchical Representations,” NeurIPS 2017.

[^10]: Berlin and Kay, Basic Color Terms: Their Universality and Evolution (University of California Press, 1969). Greenberg, “Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements,” in Universals of Language (MIT Press, 1963).