Skip to content

Haslab-dev/BuildYourOwn-SQLite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mysqlite — Build Your Own SQLite

A lightweight SQLite-compatible database reader built from scratch in Go. This project teaches SQLite internals by progressively implementing features that allow querying a real .sqlite database file.

Architecture

cmd/main.go              CLI entry point
internal/
  header/header.go       SQLite file header parsing
  btree/btree.go         B-Tree page parsing (table + index)
  record/record.go       Varint decoding, serial types, record format
  parser/parser.go       SQL SELECT parser
data/sample.db           Test SQLite database
client/test.test.ts      BunJS test suite

Quick Start

go build -o bin/mysqlite ./cmd

# CLI commands
./bin/mysqlite data/sample.db .dbinfo
./bin/mysqlite data/sample.db .tables
./bin/mysqlite data/sample.db "SELECT name FROM users;"
./bin/mysqlite data/sample.db "SELECT * FROM users WHERE country = 'Indonesia';"

Test

cd client && bun test

Stage-by-Stage Guide

Stage 1-2: Read SQLite Header & Print Page Size

Difficulty: Very Easy
Concepts: Binary file reading, endianness, SQLite header format
Code: internal/header/header.go

Every SQLite file starts with a 100-byte header. The first 16 bytes are the magic string SQLite format 3\0. The page size is stored as a big-endian uint16 at bytes 16-17.

func Parse(data []byte) (*Header, error) {
    if string(data[0:16]) != "SQLite format 3\x00" {
        return nil, fmt.Errorf("not a valid SQLite database file")
    }
    h := &Header{
        PageSize:     binary.BigEndian.Uint16(data[16:18]),
        TextEncoding: binary.BigEndian.Uint32(data[56:60]),
    }
    return h, nil
}

Key insight: SQLite uses big-endian (network byte order) for all multi-byte values in the header. The page size can be any power of 2 between 512 and 65536. A stored value of 1 means 65536.

Output:

$ ./bin/mysqlite data/sample.db .dbinfo
database page size: 4096
number of tables: 4

Stage 3-4: Count Tables & List Table Names

Difficulty: Hard
Concepts: sqlite_master schema table, B-Tree leaf traversal, cell parsing, record decoding
Code: internal/btree/btree.go, internal/record/record.go

The first page (page 1) of every SQLite database contains the sqlite_master table — a special B-Tree that stores schema information. Each row in this table describes a database object (table, index, view, trigger).

How it works:

  1. Read page 1 (note: page 1's B-Tree header starts at offset 100, after the file header)
  2. Parse the B-Tree page header to get the page type and cell pointer array
  3. Each cell pointer points to a cell containing a record
  4. Decode each record — the columns are: (type, name, tbl_name, rootpage, sql)

Varints are SQLite's variable-length integer encoding. Each byte uses 7 bits for data and 1 bit to indicate more bytes follow. Up to 9 bytes, with the 9th byte using all 8 bits.

Serial Types describe column values in a record:

Serial Type Meaning Size
0 NULL 0
1 8-bit int 1
2 16-bit int 2
3 24-bit int 3
4 32-bit int 4
5 48-bit int 6
6 64-bit int 8
7 IEEE 754 float 8
8 Integer 0 0
9 Integer 1 0
>=12, even BLOB (N-12)/2
>=13, odd TEXT (N-13)/2

Output:

$ ./bin/mysqlite data/sample.db .tables
users
companies
orders
bigtable

Stage 5: SELECT COUNT(*) FROM table

Difficulty: Medium
Concepts: SQL parsing, table scans, row counting
Code: internal/parser/parser.go, cmd/main.go (countRows)

The SQL parser tokenizes simple SELECT statements into an AST with columns, table name, and optional WHERE clause.

To count rows, we traverse the table's B-Tree:

  • Leaf pages: count the number of cells (= number of rows)
  • Interior pages: recursively count children + right child pointer
func countRows(r *btree.Reader, rootPage int) int {
    page := r.ReadPage(rootPage)
    ph := btree.ParseHeader(page, rootPage == 1)
    cells := btree.ReadCells(page, ph)
    if ph.Type == btree.LeafTable {
        return len(cells)
    }
    // interior: recurse into each left child + right child
    ...
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT COUNT(*) FROM users;"
5

Stage 6: Single Column SELECT

Difficulty: Hard
Concepts: Row decoding, serial types, column mapping, CREATE TABLE parsing
Code: cmd/main.go (parseColumns, scanRows, findColIdx)

To read actual data, we need to:

  1. Parse the CREATE TABLE SQL to get column definitions
  2. For each cell in a leaf page, decode the record payload
  3. Map column names to their positions in the record

INTEGER PRIMARY KEY gotcha: When a column is declared as INTEGER PRIMARY KEY, SQLite uses the cell's rowid as the column value — it's NOT stored in the record payload. The record still has an entry for that column position, but it contains NULL. We detect this and substitute the rowid.

row := make([]interface{}, len(cols))
for i := range cols {
    if i < len(vals) {
        row[i] = vals[i]
    }
    if i == rowIDCol {
        row[i] = cell.RowID  // override with actual rowid
    }
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT name FROM users;"
Alice
Bob
Charlie
Diana
Eve

Stage 7: Multiple Column SELECT & SELECT *

Difficulty: Hard
Concepts: Projection, multiple column extraction, pipe-delimited output

Extends Stage 6 to support multiple columns and SELECT *. The projection logic looks up each requested column's index in the schema and extracts the corresponding value from each row.

Output:

$ ./bin/mysqlite data/sample.db "SELECT id, email FROM users;"
1|alice@email.com
2|bob@email.com
3|charlie@email.com
4|diana@email.com
5|eve@email.com

Stage 8: WHERE Clause Filtering

Difficulty: Hard
Concepts: Expression evaluation, predicate filtering, query execution
Code: internal/parser/parser.go (WhereClause), cmd/main.go (filterRows, compare)

The parser extracts WHERE conditions into a simple Column Operator Value structure. Supported operators: =, !=, >, <, >=, <=.

After scanning all rows, we apply the filter:

func filterRows(rows [][]interface{}, cols []colDef, w *WhereClause) [][]interface{} {
    colIdx := findColIdx(cols, w.Column)
    var filtered [][]interface{}
    for _, row := range rows {
        if compare(row[colIdx], w.Operator, w.Value) {
            filtered = append(filtered, row)
        }
    }
    return filtered
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT * FROM companies WHERE country = 'Indonesia';"
1|Acme Corp|Indonesia

Stage 9: Full Table Scan (Multi-Page B-Tree)

Difficulty: Hard
Concepts: B-Tree interior pages, cursor iteration, recursive traversal
Code: cmd/main.go (scanPage), internal/btree/btree.go

When a table has too many rows to fit in a single page, SQLite uses interior pages that point to child pages. The structure is:

Interior Page (type 5)
├── Cell 1: [left_child_ptr, rowid]     → points to child page
├── Cell 2: [left_child_ptr, rowid]     → points to child page
├── ...
└── Right child pointer                  → points to rightmost child

Our scan recursively traverses all child pages:

func scanPage(r *btree.Reader, pageNum int, ...) [][]interface{} {
    if ph.Type == btree.LeafTable {
        // decode all cells in this leaf
    } else if ph.Type == btree.InteriorTable {
        for _, cell := range cells {
            rows = append(rows, scanPage(r, int(cell.LeftChild), ...)...)
        }
        rows = append(rows, scanPage(r, int(ph.RightChild), ...)...)
    }
}

Verified with a 500-row table that spans multiple B-tree pages.

Output:

$ ./bin/mysqlite data/sample.db "SELECT COUNT(*) FROM bigtable;"
500

Stage 10: Index-Based Retrieval

Difficulty: Very Hard
Concepts: Index B-Trees, query planning, indexed lookup, rowid resolution
Code: cmd/main.go (findIndex, indexSearch, lookupRowInPage), internal/btree/btree.go

When a WHERE clause matches an indexed column with =, we can skip the full table scan:

  1. Find the index — check sqlite_master for an index on the filtered column
  2. Search the index B-Tree — index pages use types 2 (interior index) and 10 (leaf index). Index records contain [indexed_value, rowid]
  3. Resolve rowids — for each matching rowid, do a B-Tree lookup on the table to fetch the full row
// Index leaf: decode payload to get [value, rowid]
vals := record.Decode(cell.Payload)
if fmtVal(vals[0]) == target {
    rowIDs = append(rowIDs, vals[1].(int64))
}

// Then fetch each row by rowid using table B-Tree navigation
func lookupRowInPage(r, pageNum, rowID, cols, rowIDCol) {
    // Navigate interior pages: if rowID <= cell.RowID, go left
    // At leaf page: find cell with matching rowID
}

Output:

$ ./bin/mysqlite data/sample.db "SELECT * FROM users WHERE email = 'alice@email.com';"
1|Alice|alice@email.com|Indonesia

Test Results

$ bun test test.test.ts
bun test v1.3.11-canary.108 (a305e99a)

test.test.ts:
✓ Stage 1-2: .dbinfo > prints page size [41.30ms]
✓ Stage 1-2: .dbinfo > prints table count [19.57ms]
✓ Stage 3-4: .tables > lists user tables [16.61ms]
✓ Stage 5: COUNT(*) > counts users [18.21ms]
✓ Stage 5: COUNT(*) > counts companies [19.97ms]
✓ Stage 5: COUNT(*) > counts orders [17.58ms]
✓ Stage 5: COUNT(*) > counts bigtable [15.62ms]
✓ Stage 6: Single column SELECT > SELECT name FROM users [14.14ms]
✓ Stage 7: Multiple column SELECT > SELECT id, email FROM users [12.43ms]
✓ Stage 7: Multiple column SELECT > SELECT * FROM users [12.87ms]
✓ Stage 8: WHERE clause > filter companies by country [12.21ms]
✓ Stage 8: WHERE clause > filter users by country [12.48ms]
✓ Stage 9: Full table scan > SELECT * FROM bigtable returns 500 rows [13.24ms]
✓ Stage 10: Index retrieval > index lookup by email [13.07ms]
✓ Stage 10: Index retrieval > index lookup by country returns 3 rows [15.76ms]

 15 pass
 0 fail
 23 expect() calls
Ran 15 tests across 1 file. [391.00ms]

Commit History

Each stage was committed separately for step-by-step learning:

Commit Stage Description
af7e001 init Project structure, CLI skeleton, sample test database
e98de2f 1-2 Parse SQLite file header and print page size via .dbinfo
32b2a4f 3-4 Traverse sqlite_master B-tree, count tables and list names via .tables
da94204 5 Support SELECT COUNT(*) FROM table with B-tree row counting
865298e 6 Single column SELECT — parse CREATE TABLE schema, scan rows, project column
0170433 7 Multiple column SELECT and SELECT * with INTEGER PRIMARY KEY rowid mapping
b64eac5 8 WHERE clause filtering with =, !=, >, < operators
61dbf29 9 Full table scan across multi-page B-tree (verified with 500-row table)
66c73b4 10 Index-based retrieval using index B-tree search and rowid lookup
83faded test BunJS test client with 15 tests covering all 10 stages

What's NOT Implemented

This is a read-only educational database reader:

  • INSERT / UPDATE / DELETE
  • Transactions & WAL
  • JOIN / GROUP BY / ORDER BY
  • LIKE operator
  • Concurrency

References

About

A lightweight SQLite-compatible database reader built from scratch in Go. This project teaches SQLite internals by progressively implementing features that allow querying a real `.sqlite` database file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors