Skip to content

fix(storage): JSON-serialize date/datetime metadata from YAML frontmatter#25

Merged
torresmateo merged 2 commits into
ArcadeAI:mainfrom
portofcontext:fix/date-frontmatter-serialization
May 21, 2026
Merged

fix(storage): JSON-serialize date/datetime metadata from YAML frontmatter#25
torresmateo merged 2 commits into
ArcadeAI:mainfrom
portofcontext:fix/date-frontmatter-serialization

Conversation

@pk8189
Copy link
Copy Markdown
Contributor

@pk8189 pk8189 commented May 20, 2026

What

Database.insert_document and update_document call json.dumps(document.metadata) to persist parsed-frontmatter metadata. Frontmatter dates like last_push: 2026-05-19 are parsed by PyYAML into datetime.date objects, which the stdlib JSON encoder doesn't know how to handle, so any markdown file with a date in frontmatter fails to index:

```
TypeError: Object of type date is not JSON serializable
```

Repro

Hits anyone using Obsidian (very common to have created: YYYY-MM-DD or similar in frontmatter). A minimal example:

```

title: My note
last_push: 2026-05-19

```

`librarian add /path/to/that/note.md` → `Error: ... Object of type date is not JSON serializable`.

Fix

Pass a small `default=` callback to `json.dumps` that converts `date` / `datetime` to ISO strings, and `str()`'s anything else unexpected. Stored as strings; round-tripped values come back as strings — acceptable because metadata is informational, not queried as dates.

Tests

New `tests/test_database.py` with regression tests for both `insert_document` and `update_document`. Verified the tests fail on `main` (reproducing `TypeError`) and pass with this change. Existing `test_parser.py` continues to pass.

Notes

  • Helper is named `_json_default` and lives next to `get_effective_embedding_dimension` so other call sites that serialize metadata can reuse it if needed.
  • No new dependencies.
  • Falls under "bug fix with regression test" per CONTRIBUTING.md.

…tter

Obsidian and other markdown frontmatter commonly contain `YYYY-MM-DD`
values that PyYAML parses into `datetime.date`, e.g.:

    ---
    last_push: 2026-05-19
    ---

When `MarkdownParser` extracted the frontmatter and `Database.insert_document`
ran `json.dumps(document.metadata)`, this crashed with:

    TypeError: Object of type date is not JSON serializable

Add a small `_json_default` fallback that converts `date` / `datetime`
to ISO strings (and falls back to `str()` for anything else). Round-tripped
values come back as strings — acceptable because metadata is informational
and not queried as dates.

Includes a regression test that fails before this change and passes after,
covering both `insert_document` and `update_document`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

Copy link
Copy Markdown
Collaborator

@torresmateo torresmateo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM with the exception of a small Python style issue!

Comment thread librarian/storage/database.py Outdated
@torresmateo torresmateo merged commit b24c100 into ArcadeAI:main May 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants