Skip to content

Forkserver child processes start with no logging configured (embed/ingest logs lost) #319

@bygadd

Description

@bygadd

When the backend uses the forkserver start method, log records from child processes — the embedding/ingest path (ccb.injest, doc_loader, models, vectordb, network_em) — never reach the configured handlers; only the parent process logs. Per-request indexing/embedding activity is effectively invisible, which makes indexing problems hard to diagnose.

Cause: setup_logging() runs inside if __name__ == '__main__' in main.py (~L50–53), and the start method is set to forkserver just after. Forkserver children are re-imported without __main__, so they never run setup_logging() and start with default/unconfigured logging.

Suggested fix: relay child records to the parent's handlers — a multiprocessing.Queue + logging.handlers.QueueListener on the parent, and a QueueHandler installed in each child (wired in the child entrypoint / exception wrapper, with the queue passed into the spawned process). After wiring this on our 5.3.x deployment, child PIDs and ccb.injest lines appeared in the log.

(Disclosure: investigated with AI assistance; verified on a live deployment.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions