F5. Growing a Language / SICP Principles / Semantic Compression

Diagnoses: D5. Flat Domain / Missing Vocabulary Related fixes:

F9 (Type-Centric Modularization) — types are the primitives the language is grown from; modules provide the means of combination and abstraction
F8 (Cognitive Load) — each named compound concept is a chunk that fits in working memory as one slot

The concept

(Guy Steele, “Growing a Language”, OOPSLA 1998; Sussman & Abelson, SICP, 1985; Casey Muratori, “Semantic Compression”, 2014)

Sussman and Abelson identified the three elements every powerful design system provides: primitives (the atomic building blocks), means of combination (ways to compose smaller things into bigger things), and means of abstraction (ways to name compound things so they can be used as if they were primitives). This triad is the generative grammar of software design.

Steele’s complementary insight: a good language (or domain model) is one that can be grown from small pieces. You start with a few well-chosen primitives. Users (or later code) compose those primitives into bigger pieces. Those bigger pieces should be indistinguishable from the primitives — they compose the same way, they’re invoked the same way. The key test: “new words defined by a library should look just like the primitives of the language.”

The result is a layered vocabulary:

Layer 0: The fundamental primitives (types, basic operations)
Layer 1: Compositions of primitives into common patterns
Layer 2: Compositions of Layer 1 concepts into higher-level operations
Layer N: The vocabulary at which the application speaks

Each layer is built entirely from the layer below it. Each layer compresses the vocabulary of the lower layer — it gives a name to a frequently-occurring pattern, allowing the reader to treat a complex idea as a single chunk (Miller’s chunking principle; see sources.md).

This is what Muratori calls semantic compression: the code’s vocabulary mirrors the problem domain, with frequently-expressed ideas given their own names and used consistently. Well-compressed code is easy to read because there’s minimal code, and the semantics mirror the real “language” of the problem.

Remedy

Identify the primitives. What are the smallest, most precise operations and types in your domain? These should be small, pure, and independently understandable.
Look for repeated patterns. When the same combination of primitives appears in multiple places, extract it and name it. But only when you’ve seen it at least twice (Muratori’s rule: “make your code usable before you try to make it reusable”).
Build upward. Each successive layer should feel like a natural vocabulary built from the layer below. Reading a high-level function should feel like reading a sentence in the domain language.
Ensure composability. The composed pieces should combine the same way the primitives do. If composing two high-level concepts requires dropping down to the primitive layer, the abstraction has a leak.