Commit 84ed278
authored
chore(ingestion): enable basedpyright across the codebase via baseline (#27755)
* chore(ingestion): enable basedpyright across the codebase via baseline
Removes the ~25 paths from `[tool.basedpyright] ignore` (which excluded
roughly 90% of the codebase from type checking) and grandfathers the
existing violations into a baseline file. New violations in any
previously-ignored file now fail CI.
Changes:
- ingestion/pyproject.toml: drop the entire `ignore = [...]` block
- ingestion/setup.py: bump `basedpyright~=1.14` to `~=1.39.0`
- ingestion/.basedpyright/baseline.json (new, ~13MB): captures the
starting violation set (~18.8K errors + ~37.4K warnings) so the
migration is behavior-preserving. Regenerate with
`cd ingestion && basedpyright -p pyproject.toml --baselinefile
.basedpyright/baseline.json --writebaseline`. basedpyright analysis
has minor non-determinism (similar to ruff's), so re-running
--writebaseline a few times converges the baseline.
- ingestion/noxfile.py: pass `--baselinefile .basedpyright/baseline.json`
to the basedpyright invocation in the `static-checks` session so CI
honors the grandfathering. CI already runs the session via
`cd ingestion && nox --no-venv -s static-checks` (py-tests.yml).
- ingestion/Makefile: `make static-checks` now delegates to
`nox -s static-checks` so local invocations match CI exactly. Also
drops the dead Python 3.9 / OM_SKIP_SDK_PY39 branch (we require
Python >=3.10 since the previous modernization PR).
- .gitignore: add `.serena/` (local language-server cache)
* chore(ingestion): add nox to the dev dependency set
The static-checks Makefile target and the py-tests CI job both delegate
to `nox -s static-checks`, but nox was being installed as a separate
side step (`pip install nox` in `install_dev_env`, `uv pip install nox`
in the test-environment composite action). Listing it in dev extras
means a plain `pip install ingestion[dev]` brings it in.
* chore(ingestion): pin basedpyright analysis to py3.10; CI runs once
Following the basedpyright + multi-Python-version research:
- ingestion/pyproject.toml: add `pythonVersion = "3.10"` to
[tool.basedpyright] so type-checking always analyzes for the lowest
supported Python version. Forward-incompatible code (tomllib usage,
PEP 695 generics, etc.) is caught at type-check time regardless of
which Python interpreter runs the checker.
- .github/workflows/py-tests.yml: gate the "Run Static Checks" step on
`matrix.py-version == '3.10'`. With pythonVersion pinned, results are
identical across the matrix; running once avoids redundant work and
keeps the baseline file deterministic. Unit tests still run on the
full 3.10/3.11/3.12 matrix to verify runtime compatibility.
- ingestion/.basedpyright/baseline.json: regenerated cleanly with the
new pythonVersion config (~18.8K errors / ~37.3K warnings, similar
scale to the previous baseline). Aligns with the canonical
type-check-on-floor / test-on-matrix pattern used by Pydantic, CPython,
and other major Python projects.
* chore(ingestion): pin basedpyright pythonPlatform to Linux + regen baseline
CI's previous run still surfaced ~9 issues (2 errors + 7 warnings) that
weren't in the baseline. Root cause: my local environment differs from
CI's in three ways that affect type inference — Python interpreter
(3.11 vs 3.10), platform (Darwin vs Linux), and pip-resolved package
versions (couchbase, avro, trino, sqlalchemy stubs all differ slightly).
This commit closes the platform gap and regenerates the baseline from a
fresh CI-equivalent environment:
- ingestion/pyproject.toml: add `pythonPlatform = "Linux"` to
[tool.basedpyright] so type-checking uses the Linux subset of stdlib /
third-party stubs regardless of where the analyzer runs.
- ingestion/.basedpyright/baseline.json: regenerated against a fresh
Python 3.10 venv installed via `uv pip install ingestion[test]` (the
same install path CI's setup-openmetadata-test-environment composite
action uses). New scale: ~18.7K errors / ~37.5K warnings — same
ballpark as the previous baseline, with column positions now matching
CI's environment.
Local-developer note: when running `make static-checks` from a venv
that doesn't mirror CI exactly (e.g. macOS, Python 3.11, different
package versions), you may see drift errors. The supported workflow for
regenerating the baseline is to mirror CI:
python3.10 -m venv /tmp/ci-mirror
source /tmp/ci-mirror/bin/activate
uv pip install --upgrade pip "setuptools<81"
uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
uv pip install -e "ingestion[test]"
uv pip install "basedpyright~=1.39.0" nox
cd ingestion && basedpyright -p pyproject.toml \
--baselinefile .basedpyright/baseline.json --writebaseline
* chore(ingestion): drop pythonPlatform pin and regen baseline from CI-mirror
The previous attempt added `pythonPlatform = "Linux"` thinking it would
make the local-generated baseline match CI. It did the opposite — Linux
platform stubs activate additional conditional code paths that weren't
analyzed before, so CI saw 101 errors instead of the prior 2 errors.
Reverting:
- Drop `pythonPlatform = "Linux"` from [tool.basedpyright]. Without it,
basedpyright analyzes for the host platform; on CI's ubuntu-latest
runner that's Linux automatically, but type-stub coverage stays the
same as before (matching the d9196df baseline).
- Regenerate ingestion/.basedpyright/baseline.json against a fresh
Python 3.10 venv installed via `uv pip install ingestion[test]`
(mirroring CI's setup-openmetadata-test-environment composite action).
~18.8K errors / 37.7K warnings captured — same scale as the working
d9196df version.
Local-developer note: any baseline regeneration done on macOS will drift
from CI's Linux env (different transitive package versions, different
stubs). The supported local mirror procedure:
python3.10 -m venv /tmp/ci-mirror
source /tmp/ci-mirror/bin/activate
uv pip install --upgrade pip "setuptools<81"
uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
uv pip install -e "ingestion[test]"
uv pip install "basedpyright~=1.39.0" nox
cd ingestion && basedpyright -p pyproject.toml \\
--baselinefile .basedpyright/baseline.json --writebaseline
* chore(ingestion): regen baseline from full CI install (mac arm64 mirror)
Prior CI-mirror only installed [test], skipping [all] and the four
--no-deps SA pins (sqlalchemy-redshift/databricks/ibmi, pydoris-custom).
That left ~75 connector packages out of the analysis env, so basedpyright
couldn't resolve types from databricks.sqlalchemy, GE 0.18 Batch,
sklearn BaseEstimator, airflow SQLAlchemy models, pandas/numpy stubs,
etc. CI saw 129 errors absent from the baseline.
Regenerated against a fresh py3.10 venv that mirrors
.github/actions/setup-openmetadata-test-environment exactly:
uv pip install ./ingestion[dev]
make generate
uv pip install "setuptools<81"
uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
uv pip install --no-deps sqlalchemy-redshift==0.8.14 \
sqlalchemy-databricks==0.2.0 \
sqlalchemy-ibmi==0.9.3 \
pydoris-custom==1.1.0
uv pip install ./ingestion[all]
uv pip install ./ingestion[test]
uv pip install nox
First run: 128 errors, 272 warnings — within 1 error of CI's 129/272.
Wrote baseline with 56,100 entries across 1,035 files. Verify run with
the new baseline reports 0/0/0.
macOS arm64 vs Linux x86_64 wheel resolution may leave a small residual
(~3-7 errors per the d9196df precedent). Re-run --writebaseline 2-3x
if any show up in CI.
* chore(ingestion): silence avro.py:95 basedpyright residual
CI's Linux fastavro stub returns Schema as `str | List[Any]`, while
the macOS arm64 wheel narrows to `str` — the only error not absorbed
by the regenerated baseline. Add a targeted pyright: ignore on the
parse_avro_schema call instead of broadening behavior.
* chore(ingestion): tolerate cross-platform pyright ignore drift
CI's `--baselinemode=lock` (default) requires the baseline to match
exactly — neither up nor down. Two related issues:
1. The avro.py noqa silenced not just the surfaced error but 10
cascading entries at line 95 (sub-errors propagating from the
unresolved `schema` arg type). Baseline went `down by 10` → lock
violated → exit 3 even with `0 errors` reported. Regenerate baseline
so the 10 stale entries are dropped.
2. The macOS arm64 fastavro stub doesn't surface that error in the
first place, so basedpyright treats the noqa as
`reportUnnecessaryTypeIgnoreComment` locally — causing the opposite
lock mismatch on CI (a warning entry that doesn't exist there).
Disable the rule so platform-specific residuals can land without
flapping between local and CI.
* chore(ingestion): use --baselinemode=discard for cross-platform tolerance
CI's implicit default is `lock`, which fails on any baseline change in
either direction (errors going up *or* down) via console.error → exit 3.
That cannot accommodate macOS arm64 vs Linux x86_64 stub drift: a
baseline regenerated locally always carries some entries that don't fire
on CI (and vice versa).
`auto` would tolerate the drift but silently overwrites the baseline
file — unacceptable in CI, where unreviewed changes never get committed
back.
`discard` is the right balance:
- New errors not in the baseline still fail the run (early-return path
in BaselineHandler.write before the lock/discard branch).
- Stale baseline entries (errors that no longer fire on the current
platform) print an info message and exit 0.
- The baseline file is never modified.1 parent d7e191c commit 84ed278
8 files changed
Lines changed: 450910 additions & 46 deletions
File tree
- .github/workflows
- ingestion
- .basedpyright
- src/metadata/readers/dataframe
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
102 | 107 | | |
103 | 108 | | |
104 | 109 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
205 | 208 | | |
0 commit comments