Fix Python remote eval parameter serialization in dev mode#210
Fix Python remote eval parameter serialization in dev mode#210ekeith (evanmkeith) wants to merge 1 commit into
Conversation
## Summary
Fixes Python `bt eval --dev` `/list` responses for evals that define `parameters={...}`.
The Python eval runner was emitting parameters as a raw JSON Schema object via `parameters_to_json_schema(...)`. The Braintrust UI expects remote eval parameters to use the
serialized parameter container shape, such as `braintrust.staticParameters` or `braintrust.parameters`. As a result, `/list` could return `200 OK` while the UI still failed to
parse the evaluator manifest and showed the generic connection/listing error.
This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.
## Test Plan
- Add a Python list-mode fixture with a Pydantic parameter model.
- Assert `BT_EVAL_DEV_MODE=list` emits `parameters.type = "braintrust.staticParameters"`.
- Run:
- `python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.py`
- `cargo fmt --check`
- `cargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture`
|
Latest downloadable build artifacts for this PR commit
Available artifact names
|
|
This is blocking on passing params to the playground from remote evals. Subscribed for when this ships |
|
please rebase and we can get this merged in and released! |
Joshua Wootonn (joshuawootonn)
left a comment
There was a problem hiding this comment.
The rest of it looks good though, and for most recent SDK installers they will not hit this branch.
| static_parameters: dict[str, Any] = {} | ||
| properties = schema.get("properties", {}) | ||
| if isinstance(properties, dict): | ||
| for name, property_schema in properties.items(): | ||
| if not isinstance(property_schema, dict): | ||
| continue | ||
|
|
||
| parameter_type = property_schema.get("x-bt-type") | ||
| if parameter_type == "prompt": | ||
| parameter: dict[str, Any] = {"type": "prompt"} | ||
| elif parameter_type == "model": | ||
| parameter = {"type": "model"} | ||
| else: | ||
| parameter = {"type": "data", "schema": property_schema} | ||
|
|
||
| if "default" in property_schema: | ||
| parameter["default"] = property_schema["default"] | ||
| if isinstance(property_schema.get("description"), str): | ||
| parameter["description"] = property_schema["description"] | ||
| static_parameters[name] = parameter |
There was a problem hiding this comment.
I'm not sure why this section is necessary. Looking to understand more.
There was a problem hiding this comment.
This is unnecessary.
If you are using python SDK >= v0.10.0 serialize_remote_eval_parameters_container will be defined and the playground can handle the container type created for parameters
If you are using python SDK < v0.10.0 serialize_remote_eval_parameters_container will not be defined but parameters_to_json_schema won't actually create a JSON schema. In that old SDK version it was creating the object of JSON schemas that the playground also support.
So there's not a way of hitting this endpoint in which the new version of parameters_to_json_schema is called and returns an actual JSON schema. Which is the error that will cause an error in the playground?
|
This PR is pretty similar to something I shipped last week: #206 |
Joshua Wootonn (joshuawootonn)
left a comment
There was a problem hiding this comment.
Ok I'm realizing that this is almost the same PR I shipped last week, which is why rebasing is necessary. #206
I failed to check in on what the releasing policy is for this repo / for bt CLI so I don't think it's shipped yet. Sorry for dropping the ball everyone!
I would like us to avoid shipping this PR. I think we just need to get a new version dropped.
|
already fixed here |
Summary
Fixes Python
bt eval --dev/listresponses for evals that defineparameters={...}.The Python eval runner was emitting parameters as a raw JSON Schema object via
parameters_to_json_schema(...). The Braintrust UI expects remote eval parameters to use the serialized parameter container shape, such asbraintrust.staticParametersorbraintrust.parameters. As a result,/listcould return200 OKwhile the UI still failed to parse the evaluator manifest and showed the generic connection/listing error.This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.
Test Plan
BT_EVAL_DEV_MODE=listemitsparameters.type = "braintrust.staticParameters".python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.pycargo fmt --checkcargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture