Text Is The Substrate

3 days ago

Semantic merge should not replace text.

It should stand on it.

The source file is the durable object: bytes, line endings, whitespace, comments, string quotes, import ordering, and small local style choices. The semantic layer can explain what those bytes mean, but it should not pretend the bytes are unimportant.

Text is the substrate.

TEXT

HASHES

SPANS

TRIVIA

SEMANTIC

WRITE

Semantic evidence sits above source text, hashes, spans, and trivia. The final output still has to become text again.

Text Carries Meaning

A parser can tell the system that a function exists.

The source text tells it how that function was written.

symbol: displayName
span: offsets 38..96
trivia: leading comment, blank line, quote style
text: exact bytes the developer will review

Those details are not decoration. They are part of how code survives repeated edits.

A merge that preserves the symbol but destroys the surrounding comments may be technically valid and still be a bad merge.

A merge that rewrites a whole file to apply one import change may pass tests and still make review harder.

Semantic evidence needs source fidelity underneath it.

Hashes Make Claims Concrete

Evidence should bind back to the source it describes.

parser evidence
  source hash: h1
  span: import declaration
  claim: added writeUser
 
current source
  source hash: h2

If the hash changed, the evidence may still be useful, but it is no longer automatically current.

This is what keeps semantic merge honest. The system can say:

same text: proof can be reused
same region, new text: proof may need revalidation
different region: proof does not apply

Without hashes, a semantic proof is easy to over-widen. It starts as a statement about one file state and quietly becomes a statement about whatever file happens to exist now.

Spans Are The Bridge

A semantic region needs an address in the source.

region: exported function displayName
meaning: public callable
source span: line 8, column 1 through line 12, column 2
text hash: h_displayName

The region lets the system compare meaning.

The span lets the system write the final file.

That bridge matters because many useful merges are neither pure text nor pure graph operations. They are adaptations:

export interface User {  id: string;  fullName: string;  nickname?: string;} export function displayName(user: User) {  return user.nickname ?? user.fullName;}

Span	Record	Offset	Claim
s1	exported typetype declaration	0..93	public User shape is the contract surface
s2	fieldproperty symbol	46..54	renamed public field current at head
s3	helperfunction declaration	112..123	worker helper is preserved in output
s4	readuse site	169..177	helper read was rebased through the rename

A semantic record is useful because it points back to exact source spans: the declaration, the renamed field, the helper, and the rebased use site.

move import
rename field
insert declaration
preserve comment
emit final source

The semantic layer chooses the operation. The text layer proves where it happened.

Trivia Is Evidence Too

Whitespace, comments, directives, and local formatting are often called trivia.

That name is misleading.

"use client"
// keep this side effect before imports
/* generated by design token build */

Some trivia changes runtime behavior. Some preserves intent. Some tells a reviewer why the code looks strange.

A system that wants to merge autonomously should know the difference between:

safe normalization: reorder import specifiers
risky rewrite: drop source directive
review needed: remove explanatory comment near effect order

It does not need to understand every comment as natural language. It does need to avoid treating non-AST material as disposable.

Text Diff Is Still Useful

Line diffs are conservative.

That is a strength.

A text merge can say, "these edits do not overlap at the byte or line level." A semantic merge can then add, "these edits also do not overlap in symbol, type, selector, or runtime surface."

Those are compatible layers.

text layer: can this be written without direct overlap?
semantic layer: does the meaning still compose?
evidence layer: which claims have proof?
admission layer: should shared state move?

The goal is not to throw away text merging. The goal is to keep text merging as the low-level safety floor and add better routes above it.

The Mental Model

Treat source text as the material of the system.

Semantic structures are indexes, proofs, and routes over that material.

They make more merges possible, but they should always point back to exact source: hashes, spans, parser evidence, and final emitted text.

A merge system becomes more reliable when it remembers that meaning is not floating above the file. Meaning is attached to bytes.

SHAPE

SHIFT