Context
The CSV endpoints build the entire response in memory (StreamingResponse(iter([csv_str])), src/mavedb/routers/score_sets.py) and return it as a fake stream. This is the memory bomb that let a ChatGPT-User crawler pull full, unpaginated CSVs and saturate prod-api. Full-CSV download is a legitimate feature (the UI download button pulls with no pagination), so we stream rather than cap rows.
Part of the effort to make /scores & /counts safe to reopen to AI agents.
Scope
Acceptance criteria
Dependencies
None. Highest leverage — ship first.
Context
The CSV endpoints build the entire response in memory (
StreamingResponse(iter([csv_str])),src/mavedb/routers/score_sets.py) and return it as a fake stream. This is the memory bomb that let aChatGPT-Usercrawler pull full, unpaginated CSVs and saturate prod-api. Full-CSV download is a legitimate feature (the UI download button pulls with no pagination), so we stream rather than cap rows.Part of the effort to make
/scores&/countssafe to reopen to AI agents.Scope
_derive_csv_columns(...)fromget_score_set_variants_as_csv(src/mavedb/lib/score_sets.py), behavior-preserving, so the buffered and streaming paths share column logic.stream_score_set_variants_as_csv(...): write the header, iterate theVariantquery withstream_results=True, flush ~1k-row batches viavariant_to_csv_row.dbsession (no extra session).drop_na_columnsstreamable via a single up-front aggregate query —bool_or(...)overhgvs_nt/hgvs_splice/hgvs_pro, with a Postgres regex mirroringis_null— instead of the all-rows Python scan; useDictWriter(extrasaction='ignore').Acceptance criteria
drop_na_columns, including a test case with an all-"NA"hgvs column./scores&/countsno longer scales with full CSV size (spot-checked).Dependencies
None. Highest leverage — ship first.