A lightweight SQLite-compatible database reader built from scratch in Go. This project teaches SQLite internals by progressively implementing features that allow querying a real .sqlite database file.
cmd/main.go CLI entry point
internal/
header/header.go SQLite file header parsing
btree/btree.go B-Tree page parsing (table + index)
record/record.go Varint decoding, serial types, record format
parser/parser.go SQL SELECT parser
data/sample.db Test SQLite database
client/test.test.ts BunJS test suite
go build -o bin/mysqlite ./cmd
# CLI commands
./bin/mysqlite data/sample.db .dbinfo
./bin/mysqlite data/sample.db .tables
./bin/mysqlite data/sample.db "SELECT name FROM users;"
./bin/mysqlite data/sample.db "SELECT * FROM users WHERE country = 'Indonesia';"cd client && bun testDifficulty: Very Easy
Concepts: Binary file reading, endianness, SQLite header format
Code: internal/header/header.go
Every SQLite file starts with a 100-byte header. The first 16 bytes are the magic string SQLite format 3\0. The page size is stored as a big-endian uint16 at bytes 16-17.
func Parse(data []byte) (*Header, error) {
if string(data[0:16]) != "SQLite format 3\x00" {
return nil, fmt.Errorf("not a valid SQLite database file")
}
h := &Header{
PageSize: binary.BigEndian.Uint16(data[16:18]),
TextEncoding: binary.BigEndian.Uint32(data[56:60]),
}
return h, nil
}Key insight: SQLite uses big-endian (network byte order) for all multi-byte values in the header. The page size can be any power of 2 between 512 and 65536. A stored value of 1 means 65536.
Output:
$ ./bin/mysqlite data/sample.db .dbinfo
database page size: 4096
number of tables: 4Difficulty: Hard
Concepts: sqlite_master schema table, B-Tree leaf traversal, cell parsing, record decoding
Code: internal/btree/btree.go, internal/record/record.go
The first page (page 1) of every SQLite database contains the sqlite_master table — a special B-Tree that stores schema information. Each row in this table describes a database object (table, index, view, trigger).
How it works:
- Read page 1 (note: page 1's B-Tree header starts at offset 100, after the file header)
- Parse the B-Tree page header to get the page type and cell pointer array
- Each cell pointer points to a cell containing a record
- Decode each record — the columns are:
(type, name, tbl_name, rootpage, sql)
Varints are SQLite's variable-length integer encoding. Each byte uses 7 bits for data and 1 bit to indicate more bytes follow. Up to 9 bytes, with the 9th byte using all 8 bits.
Serial Types describe column values in a record:
| Serial Type | Meaning | Size |
|---|---|---|
| 0 | NULL | 0 |
| 1 | 8-bit int | 1 |
| 2 | 16-bit int | 2 |
| 3 | 24-bit int | 3 |
| 4 | 32-bit int | 4 |
| 5 | 48-bit int | 6 |
| 6 | 64-bit int | 8 |
| 7 | IEEE 754 float | 8 |
| 8 | Integer 0 | 0 |
| 9 | Integer 1 | 0 |
| >=12, even | BLOB | (N-12)/2 |
| >=13, odd | TEXT | (N-13)/2 |
Output:
$ ./bin/mysqlite data/sample.db .tables
users
companies
orders
bigtableDifficulty: Medium
Concepts: SQL parsing, table scans, row counting
Code: internal/parser/parser.go, cmd/main.go (countRows)
The SQL parser tokenizes simple SELECT statements into an AST with columns, table name, and optional WHERE clause.
To count rows, we traverse the table's B-Tree:
- Leaf pages: count the number of cells (= number of rows)
- Interior pages: recursively count children + right child pointer
func countRows(r *btree.Reader, rootPage int) int {
page := r.ReadPage(rootPage)
ph := btree.ParseHeader(page, rootPage == 1)
cells := btree.ReadCells(page, ph)
if ph.Type == btree.LeafTable {
return len(cells)
}
// interior: recurse into each left child + right child
...
}Output:
$ ./bin/mysqlite data/sample.db "SELECT COUNT(*) FROM users;"
5Difficulty: Hard
Concepts: Row decoding, serial types, column mapping, CREATE TABLE parsing
Code: cmd/main.go (parseColumns, scanRows, findColIdx)
To read actual data, we need to:
- Parse the
CREATE TABLESQL to get column definitions - For each cell in a leaf page, decode the record payload
- Map column names to their positions in the record
INTEGER PRIMARY KEY gotcha: When a column is declared as INTEGER PRIMARY KEY, SQLite uses the cell's rowid as the column value — it's NOT stored in the record payload. The record still has an entry for that column position, but it contains NULL. We detect this and substitute the rowid.
row := make([]interface{}, len(cols))
for i := range cols {
if i < len(vals) {
row[i] = vals[i]
}
if i == rowIDCol {
row[i] = cell.RowID // override with actual rowid
}
}Output:
$ ./bin/mysqlite data/sample.db "SELECT name FROM users;"
Alice
Bob
Charlie
Diana
EveDifficulty: Hard
Concepts: Projection, multiple column extraction, pipe-delimited output
Extends Stage 6 to support multiple columns and SELECT *. The projection logic looks up each requested column's index in the schema and extracts the corresponding value from each row.
Output:
$ ./bin/mysqlite data/sample.db "SELECT id, email FROM users;"
1|alice@email.com
2|bob@email.com
3|charlie@email.com
4|diana@email.com
5|eve@email.comDifficulty: Hard
Concepts: Expression evaluation, predicate filtering, query execution
Code: internal/parser/parser.go (WhereClause), cmd/main.go (filterRows, compare)
The parser extracts WHERE conditions into a simple Column Operator Value structure. Supported operators: =, !=, >, <, >=, <=.
After scanning all rows, we apply the filter:
func filterRows(rows [][]interface{}, cols []colDef, w *WhereClause) [][]interface{} {
colIdx := findColIdx(cols, w.Column)
var filtered [][]interface{}
for _, row := range rows {
if compare(row[colIdx], w.Operator, w.Value) {
filtered = append(filtered, row)
}
}
return filtered
}Output:
$ ./bin/mysqlite data/sample.db "SELECT * FROM companies WHERE country = 'Indonesia';"
1|Acme Corp|IndonesiaDifficulty: Hard
Concepts: B-Tree interior pages, cursor iteration, recursive traversal
Code: cmd/main.go (scanPage), internal/btree/btree.go
When a table has too many rows to fit in a single page, SQLite uses interior pages that point to child pages. The structure is:
Interior Page (type 5)
├── Cell 1: [left_child_ptr, rowid] → points to child page
├── Cell 2: [left_child_ptr, rowid] → points to child page
├── ...
└── Right child pointer → points to rightmost child
Our scan recursively traverses all child pages:
func scanPage(r *btree.Reader, pageNum int, ...) [][]interface{} {
if ph.Type == btree.LeafTable {
// decode all cells in this leaf
} else if ph.Type == btree.InteriorTable {
for _, cell := range cells {
rows = append(rows, scanPage(r, int(cell.LeftChild), ...)...)
}
rows = append(rows, scanPage(r, int(ph.RightChild), ...)...)
}
}Verified with a 500-row table that spans multiple B-tree pages.
Output:
$ ./bin/mysqlite data/sample.db "SELECT COUNT(*) FROM bigtable;"
500Difficulty: Very Hard
Concepts: Index B-Trees, query planning, indexed lookup, rowid resolution
Code: cmd/main.go (findIndex, indexSearch, lookupRowInPage), internal/btree/btree.go
When a WHERE clause matches an indexed column with =, we can skip the full table scan:
- Find the index — check
sqlite_masterfor an index on the filtered column - Search the index B-Tree — index pages use types 2 (interior index) and 10 (leaf index). Index records contain
[indexed_value, rowid] - Resolve rowids — for each matching rowid, do a B-Tree lookup on the table to fetch the full row
// Index leaf: decode payload to get [value, rowid]
vals := record.Decode(cell.Payload)
if fmtVal(vals[0]) == target {
rowIDs = append(rowIDs, vals[1].(int64))
}
// Then fetch each row by rowid using table B-Tree navigation
func lookupRowInPage(r, pageNum, rowID, cols, rowIDCol) {
// Navigate interior pages: if rowID <= cell.RowID, go left
// At leaf page: find cell with matching rowID
}Output:
$ ./bin/mysqlite data/sample.db "SELECT * FROM users WHERE email = 'alice@email.com';"
1|Alice|alice@email.com|Indonesia$ bun test test.test.ts
bun test v1.3.11-canary.108 (a305e99a)
test.test.ts:
✓ Stage 1-2: .dbinfo > prints page size [41.30ms]
✓ Stage 1-2: .dbinfo > prints table count [19.57ms]
✓ Stage 3-4: .tables > lists user tables [16.61ms]
✓ Stage 5: COUNT(*) > counts users [18.21ms]
✓ Stage 5: COUNT(*) > counts companies [19.97ms]
✓ Stage 5: COUNT(*) > counts orders [17.58ms]
✓ Stage 5: COUNT(*) > counts bigtable [15.62ms]
✓ Stage 6: Single column SELECT > SELECT name FROM users [14.14ms]
✓ Stage 7: Multiple column SELECT > SELECT id, email FROM users [12.43ms]
✓ Stage 7: Multiple column SELECT > SELECT * FROM users [12.87ms]
✓ Stage 8: WHERE clause > filter companies by country [12.21ms]
✓ Stage 8: WHERE clause > filter users by country [12.48ms]
✓ Stage 9: Full table scan > SELECT * FROM bigtable returns 500 rows [13.24ms]
✓ Stage 10: Index retrieval > index lookup by email [13.07ms]
✓ Stage 10: Index retrieval > index lookup by country returns 3 rows [15.76ms]
15 pass
0 fail
23 expect() calls
Ran 15 tests across 1 file. [391.00ms]
Each stage was committed separately for step-by-step learning:
| Commit | Stage | Description |
|---|---|---|
af7e001 |
init | Project structure, CLI skeleton, sample test database |
e98de2f |
1-2 | Parse SQLite file header and print page size via .dbinfo |
32b2a4f |
3-4 | Traverse sqlite_master B-tree, count tables and list names via .tables |
da94204 |
5 | Support SELECT COUNT(*) FROM table with B-tree row counting |
865298e |
6 | Single column SELECT — parse CREATE TABLE schema, scan rows, project column |
0170433 |
7 | Multiple column SELECT and SELECT * with INTEGER PRIMARY KEY rowid mapping |
b64eac5 |
8 | WHERE clause filtering with =, !=, >, < operators |
61dbf29 |
9 | Full table scan across multi-page B-tree (verified with 500-row table) |
66c73b4 |
10 | Index-based retrieval using index B-tree search and rowid lookup |
83faded |
test | BunJS test client with 15 tests covering all 10 stages |
This is a read-only educational database reader:
- INSERT / UPDATE / DELETE
- Transactions & WAL
- JOIN / GROUP BY / ORDER BY
- LIKE operator
- Concurrency