Text Is The Substrate
Semantic merge should not replace text.
It should stand on it.
The source file is the durable object: bytes, line endings, whitespace, comments, string quotes, import ordering, and small local style choices. The semantic layer can explain what those bytes mean, but it should not pretend the bytes are unimportant.
Text is the substrate.
Text Carries Meaning
A parser can tell the system that a function exists.
The source text tells it how that function was written.
symbol: displayName
span: offsets 38..96
trivia: leading comment, blank line, quote style
text: exact bytes the developer will reviewThose details are not decoration. They are part of how code survives repeated edits.
A merge that preserves the symbol but destroys the surrounding comments may be technically valid and still be a bad merge.
A merge that rewrites a whole file to apply one import change may pass tests and still make review harder.
Semantic evidence needs source fidelity underneath it.
Hashes Make Claims Concrete
Evidence should bind back to the source it describes.
parser evidence
source hash: h1
span: import declaration
claim: added writeUser
current source
source hash: h2If the hash changed, the evidence may still be useful, but it is no longer automatically current.
This is what keeps semantic merge honest. The system can say:
same text: proof can be reused
same region, new text: proof may need revalidation
different region: proof does not applyWithout hashes, a semantic proof is easy to over-widen. It starts as a statement about one file state and quietly becomes a statement about whatever file happens to exist now.
Spans Are The Bridge
A semantic region needs an address in the source.
region: exported function displayName
meaning: public callable
source span: line 8, column 1 through line 12, column 2
text hash: h_displayNameThe region lets the system compare meaning.
The span lets the system write the final file.
That bridge matters because many useful merges are neither pure text nor pure graph operations. They are adaptations:
export interface User { id: string; fullName: string; nickname?: string;} export function displayName(user: User) { return user.nickname ?? user.fullName;}| Span | Record | Offset | Claim |
|---|---|---|---|
| s1 | exported typetype declaration | 0..93 | public User shape is the contract surface |
| s2 | fieldproperty symbol | 46..54 | renamed public field current at head |
| s3 | helperfunction declaration | 112..123 | worker helper is preserved in output |
| s4 | readuse site | 169..177 | helper read was rebased through the rename |
A semantic record is useful because it points back to exact source spans: the declaration, the renamed field, the helper, and the rebased use site.
move import
rename field
insert declaration
preserve comment
emit final sourceThe semantic layer chooses the operation. The text layer proves where it happened.
Trivia Is Evidence Too
Whitespace, comments, directives, and local formatting are often called trivia.
That name is misleading.
"use client"
// keep this side effect before imports
/* generated by design token build */Some trivia changes runtime behavior. Some preserves intent. Some tells a reviewer why the code looks strange.
A system that wants to merge autonomously should know the difference between:
safe normalization: reorder import specifiers
risky rewrite: drop source directive
review needed: remove explanatory comment near effect orderIt does not need to understand every comment as natural language. It does need to avoid treating non-AST material as disposable.
Text Diff Is Still Useful
Line diffs are conservative.
That is a strength.
A text merge can say, "these edits do not overlap at the byte or line level." A semantic merge can then add, "these edits also do not overlap in symbol, type, selector, or runtime surface."
Those are compatible layers.
text layer: can this be written without direct overlap?
semantic layer: does the meaning still compose?
evidence layer: which claims have proof?
admission layer: should shared state move?The goal is not to throw away text merging. The goal is to keep text merging as the low-level safety floor and add better routes above it.
The Mental Model
Treat source text as the material of the system.
Semantic structures are indexes, proofs, and routes over that material.
They make more merges possible, but they should always point back to exact source: hashes, spans, parser evidence, and final emitted text.
A merge system becomes more reliable when it remembers that meaning is not floating above the file. Meaning is attached to bytes.