Skip to content

Commit 84ed278

Browse files
authored
chore(ingestion): enable basedpyright across the codebase via baseline (#27755)
* chore(ingestion): enable basedpyright across the codebase via baseline Removes the ~25 paths from `[tool.basedpyright] ignore` (which excluded roughly 90% of the codebase from type checking) and grandfathers the existing violations into a baseline file. New violations in any previously-ignored file now fail CI. Changes: - ingestion/pyproject.toml: drop the entire `ignore = [...]` block - ingestion/setup.py: bump `basedpyright~=1.14` to `~=1.39.0` - ingestion/.basedpyright/baseline.json (new, ~13MB): captures the starting violation set (~18.8K errors + ~37.4K warnings) so the migration is behavior-preserving. Regenerate with `cd ingestion && basedpyright -p pyproject.toml --baselinefile .basedpyright/baseline.json --writebaseline`. basedpyright analysis has minor non-determinism (similar to ruff's), so re-running --writebaseline a few times converges the baseline. - ingestion/noxfile.py: pass `--baselinefile .basedpyright/baseline.json` to the basedpyright invocation in the `static-checks` session so CI honors the grandfathering. CI already runs the session via `cd ingestion && nox --no-venv -s static-checks` (py-tests.yml). - ingestion/Makefile: `make static-checks` now delegates to `nox -s static-checks` so local invocations match CI exactly. Also drops the dead Python 3.9 / OM_SKIP_SDK_PY39 branch (we require Python >=3.10 since the previous modernization PR). - .gitignore: add `.serena/` (local language-server cache) * chore(ingestion): add nox to the dev dependency set The static-checks Makefile target and the py-tests CI job both delegate to `nox -s static-checks`, but nox was being installed as a separate side step (`pip install nox` in `install_dev_env`, `uv pip install nox` in the test-environment composite action). Listing it in dev extras means a plain `pip install ingestion[dev]` brings it in. * chore(ingestion): pin basedpyright analysis to py3.10; CI runs once Following the basedpyright + multi-Python-version research: - ingestion/pyproject.toml: add `pythonVersion = "3.10"` to [tool.basedpyright] so type-checking always analyzes for the lowest supported Python version. Forward-incompatible code (tomllib usage, PEP 695 generics, etc.) is caught at type-check time regardless of which Python interpreter runs the checker. - .github/workflows/py-tests.yml: gate the "Run Static Checks" step on `matrix.py-version == '3.10'`. With pythonVersion pinned, results are identical across the matrix; running once avoids redundant work and keeps the baseline file deterministic. Unit tests still run on the full 3.10/3.11/3.12 matrix to verify runtime compatibility. - ingestion/.basedpyright/baseline.json: regenerated cleanly with the new pythonVersion config (~18.8K errors / ~37.3K warnings, similar scale to the previous baseline). Aligns with the canonical type-check-on-floor / test-on-matrix pattern used by Pydantic, CPython, and other major Python projects. * chore(ingestion): pin basedpyright pythonPlatform to Linux + regen baseline CI's previous run still surfaced ~9 issues (2 errors + 7 warnings) that weren't in the baseline. Root cause: my local environment differs from CI's in three ways that affect type inference — Python interpreter (3.11 vs 3.10), platform (Darwin vs Linux), and pip-resolved package versions (couchbase, avro, trino, sqlalchemy stubs all differ slightly). This commit closes the platform gap and regenerates the baseline from a fresh CI-equivalent environment: - ingestion/pyproject.toml: add `pythonPlatform = "Linux"` to [tool.basedpyright] so type-checking uses the Linux subset of stdlib / third-party stubs regardless of where the analyzer runs. - ingestion/.basedpyright/baseline.json: regenerated against a fresh Python 3.10 venv installed via `uv pip install ingestion[test]` (the same install path CI's setup-openmetadata-test-environment composite action uses). New scale: ~18.7K errors / ~37.5K warnings — same ballpark as the previous baseline, with column positions now matching CI's environment. Local-developer note: when running `make static-checks` from a venv that doesn't mirror CI exactly (e.g. macOS, Python 3.11, different package versions), you may see drift errors. The supported workflow for regenerating the baseline is to mirror CI: python3.10 -m venv /tmp/ci-mirror source /tmp/ci-mirror/bin/activate uv pip install --upgrade pip "setuptools<81" uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9" uv pip install -e "ingestion[test]" uv pip install "basedpyright~=1.39.0" nox cd ingestion && basedpyright -p pyproject.toml \ --baselinefile .basedpyright/baseline.json --writebaseline * chore(ingestion): drop pythonPlatform pin and regen baseline from CI-mirror The previous attempt added `pythonPlatform = "Linux"` thinking it would make the local-generated baseline match CI. It did the opposite — Linux platform stubs activate additional conditional code paths that weren't analyzed before, so CI saw 101 errors instead of the prior 2 errors. Reverting: - Drop `pythonPlatform = "Linux"` from [tool.basedpyright]. Without it, basedpyright analyzes for the host platform; on CI's ubuntu-latest runner that's Linux automatically, but type-stub coverage stays the same as before (matching the d9196df baseline). - Regenerate ingestion/.basedpyright/baseline.json against a fresh Python 3.10 venv installed via `uv pip install ingestion[test]` (mirroring CI's setup-openmetadata-test-environment composite action). ~18.8K errors / 37.7K warnings captured — same scale as the working d9196df version. Local-developer note: any baseline regeneration done on macOS will drift from CI's Linux env (different transitive package versions, different stubs). The supported local mirror procedure: python3.10 -m venv /tmp/ci-mirror source /tmp/ci-mirror/bin/activate uv pip install --upgrade pip "setuptools<81" uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9" uv pip install -e "ingestion[test]" uv pip install "basedpyright~=1.39.0" nox cd ingestion && basedpyright -p pyproject.toml \\ --baselinefile .basedpyright/baseline.json --writebaseline * chore(ingestion): regen baseline from full CI install (mac arm64 mirror) Prior CI-mirror only installed [test], skipping [all] and the four --no-deps SA pins (sqlalchemy-redshift/databricks/ibmi, pydoris-custom). That left ~75 connector packages out of the analysis env, so basedpyright couldn't resolve types from databricks.sqlalchemy, GE 0.18 Batch, sklearn BaseEstimator, airflow SQLAlchemy models, pandas/numpy stubs, etc. CI saw 129 errors absent from the baseline. Regenerated against a fresh py3.10 venv that mirrors .github/actions/setup-openmetadata-test-environment exactly: uv pip install ./ingestion[dev] make generate uv pip install "setuptools<81" uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9" uv pip install --no-deps sqlalchemy-redshift==0.8.14 \ sqlalchemy-databricks==0.2.0 \ sqlalchemy-ibmi==0.9.3 \ pydoris-custom==1.1.0 uv pip install ./ingestion[all] uv pip install ./ingestion[test] uv pip install nox First run: 128 errors, 272 warnings — within 1 error of CI's 129/272. Wrote baseline with 56,100 entries across 1,035 files. Verify run with the new baseline reports 0/0/0. macOS arm64 vs Linux x86_64 wheel resolution may leave a small residual (~3-7 errors per the d9196df precedent). Re-run --writebaseline 2-3x if any show up in CI. * chore(ingestion): silence avro.py:95 basedpyright residual CI's Linux fastavro stub returns Schema as `str | List[Any]`, while the macOS arm64 wheel narrows to `str` — the only error not absorbed by the regenerated baseline. Add a targeted pyright: ignore on the parse_avro_schema call instead of broadening behavior. * chore(ingestion): tolerate cross-platform pyright ignore drift CI's `--baselinemode=lock` (default) requires the baseline to match exactly — neither up nor down. Two related issues: 1. The avro.py noqa silenced not just the surfaced error but 10 cascading entries at line 95 (sub-errors propagating from the unresolved `schema` arg type). Baseline went `down by 10` → lock violated → exit 3 even with `0 errors` reported. Regenerate baseline so the 10 stale entries are dropped. 2. The macOS arm64 fastavro stub doesn't surface that error in the first place, so basedpyright treats the noqa as `reportUnnecessaryTypeIgnoreComment` locally — causing the opposite lock mismatch on CI (a warning entry that doesn't exist there). Disable the rule so platform-specific residuals can land without flapping between local and CI. * chore(ingestion): use --baselinemode=discard for cross-platform tolerance CI's implicit default is `lock`, which fails on any baseline change in either direction (errors going up *or* down) via console.error → exit 3. That cannot accommodate macOS arm64 vs Linux x86_64 stub drift: a baseline regenerated locally always carries some entries that don't fire on CI (and vice versa). `auto` would tolerate the drift but silently overwrites the baseline file — unacceptable in CI, where unreviewed changes never get committed back. `discard` is the right balance: - New errors not in the baseline still fail the run (early-return path in BaselineHandler.write before the lock/discard branch). - Stale baseline entries (errors that no longer fire on the current platform) print an info message and exit 0. - The baseline file is never modified.
1 parent d7e191c commit 84ed278

8 files changed

Lines changed: 450910 additions & 46 deletions

File tree

.github/workflows/py-tests.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,11 @@ jobs:
9999
install-server: 'false'
100100

101101
- name: Run Static Checks
102+
# basedpyright is configured with `pythonVersion = "3.10"` (the lowest
103+
# supported version) so type-checking results are identical across the
104+
# 3.10/3.11/3.12 matrix. Run on the lowest version only to avoid
105+
# redundant work and keep the baseline file deterministic.
106+
if: matrix.py-version == '3.10'
102107
run: |
103108
source env/bin/activate
104109
cd ingestion

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,4 +202,7 @@ ingestion/.claude/agents
202202
.claude/scheduled_tasks.lock
203203
.claude/plans/
204204

205+
# Serena MCP language-server cache — local tooling, not committed
206+
.serena/
207+
205208
test-results/

0 commit comments

Comments
 (0)