Skip to content

Fix Python remote eval parameter serialization in dev mode#210

Closed
ekeith (evanmkeith) wants to merge 1 commit into
mainfrom
05-27-fix-python-remote-eval-parameter-listing
Closed

Fix Python remote eval parameter serialization in dev mode#210
ekeith (evanmkeith) wants to merge 1 commit into
mainfrom
05-27-fix-python-remote-eval-parameter-listing

Conversation

@evanmkeith

Copy link
Copy Markdown

Summary

Fixes Python bt eval --dev /list responses for evals that define parameters={...}.

The Python eval runner was emitting parameters as a raw JSON Schema object via parameters_to_json_schema(...). The Braintrust UI expects remote eval parameters to use the serialized parameter container shape, such as braintrust.staticParameters or braintrust.parameters. As a result, /list could return 200 OK while the UI still failed to parse the evaluator manifest and showed the generic connection/listing error.

This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.

Test Plan

  • Add a Python list-mode fixture with a Pydantic parameter model.
  • Assert BT_EVAL_DEV_MODE=list emits parameters.type = "braintrust.staticParameters".
  • Run:
    • python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.py
    • cargo fmt --check
    • cargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture

## Summary

  Fixes Python `bt eval --dev` `/list` responses for evals that define `parameters={...}`.

  The Python eval runner was emitting parameters as a raw JSON Schema object via `parameters_to_json_schema(...)`. The Braintrust UI expects remote eval parameters to use the
  serialized parameter container shape, such as `braintrust.staticParameters` or `braintrust.parameters`. As a result, `/list` could return `200 OK` while the UI still failed to
  parse the evaluator manifest and showed the generic connection/listing error.

  This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.

  ## Test Plan

  - Add a Python list-mode fixture with a Pydantic parameter model.
  - Assert `BT_EVAL_DEV_MODE=list` emits `parameters.type = "braintrust.staticParameters"`.
  - Run:
    - `python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.py`
    - `cargo fmt --check`
    - `cargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture`
@github-actions

Copy link
Copy Markdown
Contributor

Latest downloadable build artifacts for this PR commit 1f4618877fd0:

Available artifact names
  • ``artifacts-build-global
  • ``artifacts-build-local-aarch64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-apple-darwin
  • ``artifacts-build-local-x86_64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-unknown-linux-musl
  • ``artifacts-build-local-aarch64-apple-darwin
  • ``artifacts-build-local-x86_64-unknown-linux-gnu
  • ``artifacts-build-local-aarch64-unknown-linux-gnu
  • ``artifacts-plan-dist-manifest
  • ``cargo-dist-cache

@spencerseale

Copy link
Copy Markdown

This is blocking on passing params to the playground from remote evals. Subscribed for when this ships

@AbhiPrasad

Copy link
Copy Markdown
Member

please rebase and we can get this merged in and released!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of it looks good though, and for most recent SDK installers they will not hit this branch.

Comment thread scripts/eval-runner.py
Comment on lines +587 to +606
static_parameters: dict[str, Any] = {}
properties = schema.get("properties", {})
if isinstance(properties, dict):
for name, property_schema in properties.items():
if not isinstance(property_schema, dict):
continue

parameter_type = property_schema.get("x-bt-type")
if parameter_type == "prompt":
parameter: dict[str, Any] = {"type": "prompt"}
elif parameter_type == "model":
parameter = {"type": "model"}
else:
parameter = {"type": "data", "schema": property_schema}

if "default" in property_schema:
parameter["default"] = property_schema["default"]
if isinstance(property_schema.get("description"), str):
parameter["description"] = property_schema["description"]
static_parameters[name] = parameter

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this section is necessary. Looking to understand more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary.

If you are using python SDK >= v0.10.0 serialize_remote_eval_parameters_container will be defined and the playground can handle the container type created for parameters

If you are using python SDK < v0.10.0 serialize_remote_eval_parameters_container will not be defined but parameters_to_json_schema won't actually create a JSON schema. In that old SDK version it was creating the object of JSON schemas that the playground also support.

So there's not a way of hitting this endpoint in which the new version of parameters_to_json_schema is called and returns an actual JSON schema. Which is the error that will cause an error in the playground?

@joshuawootonn

Copy link
Copy Markdown
Contributor

This PR is pretty similar to something I shipped last week: #206

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm realizing that this is almost the same PR I shipped last week, which is why rebasing is necessary. #206

I failed to check in on what the releasing policy is for this repo / for bt CLI so I don't think it's shipped yet. Sorry for dropping the ball everyone!

I would like us to avoid shipping this PR. I think we just need to get a new version dropped.

@evanmkeith

Copy link
Copy Markdown
Author

already fixed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants