You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A self-hosted, 4-pass Modula-2 compiler targeting the Lilith computer's M-code instruction set, originally developed at ETH Zurich (~1981–1982). The compiler is written entirely in Modula-2.
The compiler follows a classic multi-pass design where each pass reads output produced by the previous one. This separation keeps each pass simple, focused, and independently testable.
The MCBase module coordinates pass scheduling and maintains the global compilation state (compstat). Each pass is launched as a separate run of the compiler binary with a different pass selector.
Compilation Pipeline
Stage
Main Module
Reads
Writes
Pass 1
MCP1MAIN.MOD
Source (.MOD)
IL1, ASCII
Pass 2
MCP2MAIN.MOD
IL1, ASCII
IL2
Pass 3
MCP3MAIN.MOD
IL1, IL2
IL1 (rewritten)
Pass 4
MCP4MAIN.MOD
IL1, IL2
OBJ, REF
Pass 1 — Lexical & Syntax Analysis
Entry point:MCP1MAIN.MOD
Responsibilities
Scans source characters and produces a token stream.
Parses the full Modula-2 grammar using recursive descent.
Resolves keywords and identifiers via hash tables.
Writes the IL1 interpass file and the ASCII identifier pool.
Key Modules
Module
Role
MCP1IO
Character-level input; GetSy() returns the next symbol; HashIdent() maps names to spelling indices (Spellix).
MCP1Iden
InitIdTables() populates hash tables with Modula-2 reserved words and predefined identifiers.
MCP1REAL
Parses and stores floating-point literal constants.
Parsing Strategy
The parser uses symbol-set-based error recovery: every grammar rule receives a fsys (follow set) parameter expressed as a BITSET. On an unexpected token the parser emits an error, skips tokens until a member of fsys ∪ first_set is found, and resumes normally. This allows compilation to continue despite syntax errors.
Symset operations
Set1(sy), Set2(sy1,sy2), Set3(sy1,sy2,sy3) — construct sets
InSet(sy, s) — fast membership test
AddSet, SubSet, InclSet — set arithmetic
Grammar Highlights
Construct
Notes
Definition module
DEFINITION MODULE … END; exports list.
Implementation
IMPLEMENTATION MODULE … END.
Block
Type / const / var / proc declarations + statement sequence.
Forward-reference tracking across modules: Reference(), EndReference().
MCSYMFIL
Symbol file serialization / deserialization for separate compilation.
Symbol Table Entry — Identrec
Identrec
name : Spellix (* index into ASCII pool *)
link : Idptr (* next in chain *)
klass : Idclass (* const | type | var | field | pure | func | mod | … *)
globmodp : Idptr (* enclosing global module *)
CASE klass OF
consts : cvalue: Constval; idtyp: Stptr
types : idtyp: Stptr
vars : vaddr, vlevel; vkind: Varkind; state: Kindvar
fields : fldaddr: CARDINAL
pures/
funcs : procnum, plev, varlength, locp: Idptr
+ isstandard, codeproc, codeentry/codelength
mods : impp, expp (import/export lists)
+ modulekey[0..2]: CARDINAL (* version tracking *)
END
Type Structure — Structrec
Structrec
form : Structform (* enums | bools | chars | ints | cards | words |
subranges | reals | pointers | sets |
proctypes | arrays | records | hides | opens *)
size : CARDINAL (* size in target words *)
stidp: Idptr (* defining identifier *)
CASE form OF
arrays : elp (element type), ixp (index type), dyn (dynamic?)
records : fieldp (field list), tagp (variant tag)
proctypes : fstparam (parameter list), rkind, funcp (return type)
pointers : elemp (pointed-to type)
sets : basep (base type)
subranges : scalp (base scalar), min, max
enums : fcstp (first constant), cstnr (count)
END
Scope Management
Scopes are maintained as a stack. Each scope corresponds to a procedure or module body:
MarkScope(id) — push new scope
ReleaseScope — pop scope, resolve pending forward references
SearchId(name) — linear search from innermost scope outward
Constant Evaluation
ConstantVal() evaluates constant expressions at compile time using recursive descent. It supports all Modula-2 constant operators, overflow detection, and type compatibility rules (intcar compatibility).
Module Keys
Each compiled definition module receives a 3-word key (modulekey[0..2]). Implementation modules verify their keys match; a mismatch prevents compilation.
Pass 3 — Body / Semantic Analysis
Entry point:MCP3MAIN.MOD
Responsibilities
Validates all executable code for semantic correctness.
Type-checks expressions, assignments, and procedure calls.
Rewrites the IL1 file with fully resolved references for Pass 4.
EnterWith() saves record address; body; ExitWith() restores
RETURN
Load return value (functions) → GenBlockReturn()
Jump Optimization
Backward branch distance is estimated before emission:
Short jump (JPB): 1-byte signed offset, range −256 … +256.
Long jump (JP): 2-word full address.
If the estimate is wrong the emitter iterates until stable.
Core Data Structures
Constval — Compile-Time Value
Constval = RECORD
CASE str: Structform OF
arrays : svalue: Stringptr (* string constant *)
|reals : rvalue: POINTER TO REAL (* double-precision float *)
ELSE
value : CARDINAL (* integer / boolean / char / set *)
END
END
Symbol files (.SYM) enable separate compilation: a definition module is compiled once and its type information is saved; implementation modules or client modules read it back without re-parsing the source.
Module keys (modulekey[0..2]: CARDINAL) guard against stale symbol files: any mismatch between the stored key and the current compilation aborts with a symbol error (symerrs).
M-Code Instruction Set
Selected instructions generated by Pass 4 (see MCMNEMON.DEF for full list):
Category
Mnemonics
Load
LI (immediate), LLW (local word), LGW (global word), LSW (stack-relative)