Skip to content

Implement the reserved tiktoken extra for accurate token counting #218

@dgenio

Description

@dgenio

Summary

Ship the tiktoken-backed token counter that the packaging already promises: a
counter implementing the existing firewall/token_counting.py seam, installed via
pip install weaver-kernel[tiktoken], so context budgets count real model tokens
instead of character/byte heuristics.

Why this matters

Budgets are the firewall's scalability promise to the LLM context window, and
context windows are measured in tokens. The extra is already declared in
pyproject.toml (tiktoken = ["tiktoken>=0.6"]) and the protocol seam exists with
a docstring naming tiktoken as the intended example — but installing the extra today
buys nothing, which is a small broken promise to anyone who reads the metadata.
Accurate counting makes budget downgrades fire at the right thresholds for real
deployments.

Current evidence

  • pyproject.toml:68: tiktoken = ["tiktoken>=0.6"] — declared, unused anywhere in src/.
  • firewall/token_counting.py:4: docstring says the module exists to plug "token counters (for example, a tiktoken-based one)" into the firewall.
  • firewall/budgets.py/transform.py currently drive decisions off byte-size estimates (see ISSUE 37).

External context

tiktoken is the de-facto tokenizer for OpenAI-family models; Anthropic counting
differs — the design should name the encoding explicitly rather than guess per
model.

Proposed implementation

  1. Add firewall/token_counting_tiktoken.py (lazy import; helpful ImportError
    message naming the extra) implementing the existing counter protocol with a
    configurable encoding (default cl100k_base or o200k_base — decide and
    document).
  2. Wire selection: Firewall(token_counter=...) already-shaped seam; document
    construction.
  3. Cache encoder instances (expensive to build); counting must stay deterministic.
  4. Bare-install safety: the module must not import tiktoken at package import time
    (pairs with ISSUE 19's no-extras CI job).

AI-agent execution notes

  • Inspect first: firewall/token_counting.py (protocol), firewall/transform.py/budgets.py (consumption), otel.py (pattern for optional-extra lazy import), tests/test_firewall.py.
  • Edge cases: non-string data (count serialized form? document), very large strings (chunked counting), unknown encoding names (typed error).
  • Follow the existing optional-dependency seam pattern exactly (mcp/otel precedents).

Acceptance criteria

  • With the extra installed, budgets use real token counts (tested with known strings and expected counts).
  • Without the extra, behavior is unchanged and importing the public API does not require tiktoken.
  • Helpful error when explicitly requesting the tiktoken counter without the extra.

Test plan

Marked tests that skip without the extra; CI job (dev extras include it or add it);
bare-install job asserts no import-time dependency. Run make ci.

Documentation plan

docs/context_firewall.md budgets section; README extras table; CHANGELOG Added.

Migration and compatibility notes

Opt-in; defaults unchanged. Budget thresholds calibrated for byte counts may need
retuning when switching to token counts — document the difference.

Risks and tradeoffs

tiktoken is a heavyweight binary dependency (hence extra-only); encoding choice can
mislead for non-OpenAI models — explicit configuration and docs mitigate.

Suggested labels

ai, llm, product, performance

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions