mysqlite — Build Your Own SQLite

A lightweight SQLite-compatible database reader built from scratch in Go. This project teaches SQLite internals by progressively implementing features that allow querying a real .sqlite database file.

Architecture

cmd/main.go              CLI entry point
internal/
  header/header.go       SQLite file header parsing
  btree/btree.go         B-Tree page parsing (table + index)
  record/record.go       Varint decoding, serial types, record format
  parser/parser.go       SQL SELECT parser
data/sample.db           Test SQLite database
client/test.test.ts      BunJS test suite

Quick Start

go build -o bin/mysqlite ./cmd

# CLI commands
./bin/mysqlite data/sample.db .dbinfo
./bin/mysqlite data/sample.db .tables
./bin/mysqlite data/sample.db "SELECT name FROM users;"
./bin/mysqlite data/sample.db "SELECT * FROM users WHERE country = 'Indonesia';"

Test

cd client && bun test

Stage-by-Stage Guide

Stage 1-2: Read SQLite Header & Print Page Size

Difficulty: Very Easy
Concepts: Binary file reading, endianness, SQLite header format
Code: internal/header/header.go

Every SQLite file starts with a 100-byte header. The first 16 bytes are the magic string SQLite format 3\0. The page size is stored as a big-endian uint16 at bytes 16-17.

func Parse(data []byte) (*Header, error) {
    if string(data[0:16]) != "SQLite format 3\x00" {
        return nil, fmt.Errorf("not a valid SQLite database file")
    }
    h := &Header{
        PageSize:     binary.BigEndian.Uint16(data[16:18]),
        TextEncoding: binary.BigEndian.Uint32(data[56:60]),
    }
    return h, nil
}

Key insight: SQLite uses big-endian (network byte order) for all multi-byte values in the header. The page size can be any power of 2 between 512 and 65536. A stored value of 1 means 65536.

Output:

$ ./bin/mysqlite data/sample.db .dbinfo
database page size: 4096
number of tables: 4

Stage 3-4: Count Tables & List Table Names

Difficulty: Hard
Concepts: sqlite_master schema table, B-Tree leaf traversal, cell parsing, record decoding
Code: internal/btree/btree.go, internal/record/record.go

The first page (page 1) of every SQLite database contains the sqlite_master table — a special B-Tree that stores schema information. Each row in this table describes a database object (table, index, view, trigger).

How it works:

Read page 1 (note: page 1's B-Tree header starts at offset 100, after the file header)
Parse the B-Tree page header to get the page type and cell pointer array
Each cell pointer points to a cell containing a record
Decode each record — the columns are: (type, name, tbl_name, rootpage, sql)

Varints are SQLite's variable-length integer encoding. Each byte uses 7 bits for data and 1 bit to indicate more bytes follow. Up to 9 bytes, with the 9th byte using all 8 bits.

Serial Types describe column values in a record:

Serial Type	Meaning	Size
0	NULL	0
1	8-bit int	1
2	16-bit int	2
3	24-bit int	3
4	32-bit int	4
5	48-bit int	6
6	64-bit int	8
7	IEEE 754 float	8
8	Integer 0	0
9	Integer 1	0
>=12, even	BLOB	(N-12)/2
>=13, odd	TEXT	(N-13)/2

Output:

$ ./bin/mysqlite data/sample.db .tables
users
companies
orders
bigtable

Stage 5: SELECT COUNT(*) FROM table

Difficulty: Medium
Concepts: SQL parsing, table scans, row counting
Code: internal/parser/parser.go, cmd/main.go (countRows)

The SQL parser tokenizes simple SELECT statements into an AST with columns, table name, and optional WHERE clause.

To count rows, we traverse the table's B-Tree:

Leaf pages: count the number of cells (= number of rows)
Interior pages: recursively count children + right child pointer

func countRows(r *btree.Reader, rootPage int) int {
    page := r.ReadPage(rootPage)
    ph := btree.ParseHeader(page, rootPage == 1)
    cells := btree.ReadCells(page, ph)
    if ph.Type == btree.LeafTable {
        return len(cells)
    }
    // interior: recurse into each left child + right child
    ...
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT COUNT(*) FROM users;"
5

Stage 6: Single Column SELECT

Difficulty: Hard
Concepts: Row decoding, serial types, column mapping, CREATE TABLE parsing
Code: cmd/main.go (parseColumns, scanRows, findColIdx)

To read actual data, we need to:

Parse the CREATE TABLE SQL to get column definitions
For each cell in a leaf page, decode the record payload
Map column names to their positions in the record

INTEGER PRIMARY KEY gotcha: When a column is declared as INTEGER PRIMARY KEY, SQLite uses the cell's rowid as the column value — it's NOT stored in the record payload. The record still has an entry for that column position, but it contains NULL. We detect this and substitute the rowid.

row := make([]interface{}, len(cols))
for i := range cols {
    if i < len(vals) {
        row[i] = vals[i]
    }
    if i == rowIDCol {
        row[i] = cell.RowID  // override with actual rowid
    }
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT name FROM users;"
Alice
Bob
Charlie
Diana
Eve

Stage 7: Multiple Column SELECT & SELECT *

Difficulty: Hard
Concepts: Projection, multiple column extraction, pipe-delimited output

Extends Stage 6 to support multiple columns and SELECT *. The projection logic looks up each requested column's index in the schema and extracts the corresponding value from each row.

Output:

$ ./bin/mysqlite data/sample.db "SELECT id, email FROM users;"
1|alice@email.com
2|bob@email.com
3|charlie@email.com
4|diana@email.com
5|eve@email.com

Stage 8: WHERE Clause Filtering

Difficulty: Hard
Concepts: Expression evaluation, predicate filtering, query execution
Code: internal/parser/parser.go (WhereClause), cmd/main.go (filterRows, compare)

The parser extracts WHERE conditions into a simple Column Operator Value structure. Supported operators: =, !=, >, <, >=, <=.

After scanning all rows, we apply the filter:

func filterRows(rows [][]interface{}, cols []colDef, w *WhereClause) [][]interface{} {
    colIdx := findColIdx(cols, w.Column)
    var filtered [][]interface{}
    for _, row := range rows {
        if compare(row[colIdx], w.Operator, w.Value) {
            filtered = append(filtered, row)
        }
    }
    return filtered
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT * FROM companies WHERE country = 'Indonesia';"
1|Acme Corp|Indonesia

Stage 9: Full Table Scan (Multi-Page B-Tree)

Difficulty: Hard
Concepts: B-Tree interior pages, cursor iteration, recursive traversal
Code: cmd/main.go (scanPage), internal/btree/btree.go

When a table has too many rows to fit in a single page, SQLite uses interior pages that point to child pages. The structure is:

Interior Page (type 5)
├── Cell 1: [left_child_ptr, rowid]     → points to child page
├── Cell 2: [left_child_ptr, rowid]     → points to child page
├── ...
└── Right child pointer                  → points to rightmost child

Our scan recursively traverses all child pages:

func scanPage(r *btree.Reader, pageNum int, ...) [][]interface{} {
    if ph.Type == btree.LeafTable {
        // decode all cells in this leaf
    } else if ph.Type == btree.InteriorTable {
        for _, cell := range cells {
            rows = append(rows, scanPage(r, int(cell.LeftChild), ...)...)
        }
        rows = append(rows, scanPage(r, int(ph.RightChild), ...)...)
    }
}

Verified with a 500-row table that spans multiple B-tree pages.

Output:

$ ./bin/mysqlite data/sample.db "SELECT COUNT(*) FROM bigtable;"
500

Stage 10: Index-Based Retrieval

Difficulty: Very Hard
Concepts: Index B-Trees, query planning, indexed lookup, rowid resolution
Code: cmd/main.go (findIndex, indexSearch, lookupRowInPage), internal/btree/btree.go

When a WHERE clause matches an indexed column with =, we can skip the full table scan:

Find the index — check sqlite_master for an index on the filtered column
Search the index B-Tree — index pages use types 2 (interior index) and 10 (leaf index). Index records contain [indexed_value, rowid]
Resolve rowids — for each matching rowid, do a B-Tree lookup on the table to fetch the full row

// Index leaf: decode payload to get [value, rowid]
vals := record.Decode(cell.Payload)
if fmtVal(vals[0]) == target {
    rowIDs = append(rowIDs, vals[1].(int64))
}

// Then fetch each row by rowid using table B-Tree navigation
func lookupRowInPage(r, pageNum, rowID, cols, rowIDCol) {
    // Navigate interior pages: if rowID <= cell.RowID, go left
    // At leaf page: find cell with matching rowID
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT * FROM users WHERE email = 'alice@email.com';"
1|Alice|alice@email.com|Indonesia

Test Results

$ bun test test.test.ts
bun test v1.3.11-canary.108 (a305e99a)

test.test.ts:
✓ Stage 1-2: .dbinfo > prints page size [41.30ms]
✓ Stage 1-2: .dbinfo > prints table count [19.57ms]
✓ Stage 3-4: .tables > lists user tables [16.61ms]
✓ Stage 5: COUNT(*) > counts users [18.21ms]
✓ Stage 5: COUNT(*) > counts companies [19.97ms]
✓ Stage 5: COUNT(*) > counts orders [17.58ms]
✓ Stage 5: COUNT(*) > counts bigtable [15.62ms]
✓ Stage 6: Single column SELECT > SELECT name FROM users [14.14ms]
✓ Stage 7: Multiple column SELECT > SELECT id, email FROM users [12.43ms]
✓ Stage 7: Multiple column SELECT > SELECT * FROM users [12.87ms]
✓ Stage 8: WHERE clause > filter companies by country [12.21ms]
✓ Stage 8: WHERE clause > filter users by country [12.48ms]
✓ Stage 9: Full table scan > SELECT * FROM bigtable returns 500 rows [13.24ms]
✓ Stage 10: Index retrieval > index lookup by email [13.07ms]
✓ Stage 10: Index retrieval > index lookup by country returns 3 rows [15.76ms]

 15 pass
 0 fail
 23 expect() calls
Ran 15 tests across 1 file. [391.00ms]

Commit History

Each stage was committed separately for step-by-step learning:

Commit	Stage	Description
`af7e001`	init	Project structure, CLI skeleton, sample test database
`e98de2f`	1-2	Parse SQLite file header and print page size via `.dbinfo`
`32b2a4f`	3-4	Traverse sqlite_master B-tree, count tables and list names via `.tables`
`da94204`	5	Support `SELECT COUNT(*) FROM table` with B-tree row counting
`865298e`	6	Single column SELECT — parse CREATE TABLE schema, scan rows, project column
`0170433`	7	Multiple column SELECT and `SELECT *` with INTEGER PRIMARY KEY rowid mapping
`b64eac5`	8	WHERE clause filtering with `=`, `!=`, `>`, `<` operators
`61dbf29`	9	Full table scan across multi-page B-tree (verified with 500-row table)
`66c73b4`	10	Index-based retrieval using index B-tree search and rowid lookup
`83faded`	test	BunJS test client with 15 tests covering all 10 stages

What's NOT Implemented

This is a read-only educational database reader:

INSERT / UPDATE / DELETE
Transactions & WAL
JOIN / GROUP BY / ORDER BY
LIKE operator
Concurrency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mysqlite — Build Your Own SQLite

Architecture

Quick Start

Test

Stage-by-Stage Guide

Stage 1-2: Read SQLite Header & Print Page Size

Stage 3-4: Count Tables & List Table Names

Stage 5: SELECT COUNT(*) FROM table

Stage 6: Single Column SELECT

Stage 7: Multiple Column SELECT & SELECT *

Stage 8: WHERE Clause Filtering

Stage 9: Full Table Scan (Multi-Page B-Tree)

Stage 10: Index-Based Retrieval

Test Results

Commit History

What's NOT Implemented

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
client		client
cmd		cmd
data		data
internal		internal
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod

Folders and files

Latest commit

History

Repository files navigation

mysqlite — Build Your Own SQLite

Architecture

Quick Start

Test

Stage-by-Stage Guide

Stage 1-2: Read SQLite Header & Print Page Size

Stage 3-4: Count Tables & List Table Names

Stage 5: SELECT COUNT(*) FROM table

Stage 6: Single Column SELECT

Stage 7: Multiple Column SELECT & SELECT *

Stage 8: WHERE Clause Filtering

Stage 9: Full Table Scan (Multi-Page B-Tree)

Stage 10: Index-Based Retrieval

Test Results

Commit History

What's NOT Implemented

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages