feat(phyai): Add PhyAI native LIBERO pi0.5 benchmark support#21
feat(phyai): Add PhyAI native LIBERO pi0.5 benchmark support#21rebecca26358 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates the PhyAI pi0.5 model server with LIBERO, adding observation processing pipelines, high-level policy wrappers, a model server for the evaluation harness, and a checkpoint conversion tool. It also refactors the eager attention backend to support non-contiguous KV cache layouts and adds a CPU fallback for gated activations. Key feedback includes clipping digitized state values to prevent out-of-bounds bin indexing, safely handling null image resolutions in the configuration, flattening state arrays to handle extra batch dimensions robustly, and validating gated activation functions early when forcing the PyTorch fallback.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| state_np = states.detach().cpu().numpy() | ||
| bins = np.linspace(-1.0, 1.0, 257)[:-1] | ||
| discretized = np.digitize(state_np, bins=bins) - 1 |
There was a problem hiding this comment.
If any state value in state_np is less than -1.0 (due to noise or numerical precision), np.digitize will return 0, resulting in a bin index of -1. This -1 will be formatted as "-1" in the prompt, which tokenizes into two separate tokens ("-" and "1") instead of a single bin token, breaking the prompt format. Consider clipping the discretized bin indices to [0, 255] to ensure they are always valid.
| state_np = states.detach().cpu().numpy() | |
| bins = np.linspace(-1.0, 1.0, 257)[:-1] | |
| discretized = np.digitize(state_np, bins=bins) - 1 | |
| state_np = states.detach().cpu().numpy() | |
| bins = np.linspace(-1.0, 1.0, 257)[:-1] | |
| discretized = np.clip(np.digitize(state_np, bins=bins) - 1, 0, 255) |
| self.image_size = int( | ||
| image_size | ||
| or self.config.get("image_resolution", [224, 224])[0] | ||
| or 224 | ||
| ) |
There was a problem hiding this comment.
If "image_resolution" is explicitly set to null in the JSON configuration file, self.config.get("image_resolution", [224, 224]) will return None, causing None[0] to raise a TypeError: 'NoneType' object is not subscriptable. Consider safely checking the type and value of "image_resolution" before indexing.
image_res = self.config.get("image_resolution")
self.image_size = int(
image_size
or (image_res[0] if isinstance(image_res, list) and image_res else None)
or 224
)| arr = np.asarray(state, dtype=np.float32) | ||
| if arr.shape != (8,): | ||
| raise ValueError(f"Expected LIBERO state shape (8,), got {arr.shape}") |
There was a problem hiding this comment.
If the state has an extra batch dimension (e.g., shape (1, 8)), arr.shape != (8,) will raise a ValueError. Flattening the array first ensures it is robust to such shape variations.
| arr = np.asarray(state, dtype=np.float32) | |
| if arr.shape != (8,): | |
| raise ValueError(f"Expected LIBERO state shape (8,), got {arr.shape}") | |
| arr = np.asarray(state, dtype=np.float32).flatten() | |
| if arr.shape != (8,): | |
| raise ValueError(f"Expected LIBERO state shape (8,), got {arr.shape}") |
| if os.environ.get("PHYAI_FORCE_TORCH_GATED_ACT") == "1": | ||
| return _torch_gated_act(canon) |
There was a problem hiding this comment.
When PHYAI_FORCE_TORCH_GATED_ACT is set to "1", _torch_gated_act(canon) is returned immediately without validating if canon is a supported activation function. This delays the ValueError until the first forward pass. Consider validating canon inside _torch_gated_act or before returning it to fail early during model construction.
| if os.environ.get("PHYAI_FORCE_TORCH_GATED_ACT") == "1": | |
| return _torch_gated_act(canon) | |
| if os.environ.get("PHYAI_FORCE_TORCH_GATED_ACT") == "1": | |
| if canon not in _GATED_ACTS: | |
| raise ValueError( | |
| f"Unsupported gated activation {name!r}; expected one of {_GATED_ACTS!r}." | |
| ) | |
| return _torch_gated_act(canon) |
| @@ -0,0 +1,19 @@ | |||
| # PhyAI pi0.5 model server — LIBERO smoke/integration config. | |||
There was a problem hiding this comment.
- do not use chinese comments.
- why using PHYAI_FORCE_TORCH_GATED_ACT ?
| def _resolve_tokenizer_path(self, name_or_path: str) -> str: | ||
| local_candidates = { | ||
| "google/paligemma-3b-pt-224": [ | ||
| Path("/mnt/data2/shared_models/paligemma-3b-pt-224"), |
There was a problem hiding this comment.
- absolute path is not allowed
| "numpy", | ||
| "safetensors", | ||
| "torch==2.11", | ||
| "transformers==5.8.1", |
There was a problem hiding this comment.
- is transformers really needed
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds ChangesPI05LiberoPolicy LIBERO Adapter
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
phyai-utils-tools/tests/test_pi05_libero_pipeline.py (1)
23-24:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMake this test hermetic; avoid machine-specific absolute checkpoint paths.
Hard-coding
/mnt/data2/...makes this test environment-dependent and brittle in CI/dev machines that lack that mount.Minimal stabilization option
+import os +import pytest ... - checkpoint = Path("/mnt/data2/shared_models/pi05_libero_finetuned_v044") + checkpoint_env = os.environ.get("PI05_LIBERO_TEST_CHECKPOINT") + if not checkpoint_env: + pytest.skip("Set PI05_LIBERO_TEST_CHECKPOINT to run this integration-backed test.") + checkpoint = Path(checkpoint_env)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phyai-utils-tools/tests/test_pi05_libero_pipeline.py` around lines 23 - 24, The test has a hard-coded machine-specific absolute path `/mnt/data2/shared_models/pi05_libero_finetuned_v044` for the checkpoint parameter in the PI05LiberoPipeline instantiation, which makes the test fail on machines without that mount and breaks in CI environments. Replace this absolute path with a hermetic approach such as using a pytest fixture that provides a valid checkpoint path, mocking the checkpoint parameter, using a relative test resource path, or employing an environment variable with a fallback to test data bundled with the repository. Ensure the checkpoint assignment on line 23 no longer references the machine-specific mount path.
🧹 Nitpick comments (2)
tools/convert_openpi_pi05_to_phyai.py (1)
185-186: 💤 Low valuePrefer direct attribute access over
getattrwith constant attribute.Using
getattr(node, "shape")provides no additional safety over direct attribute access when the attribute name is a constant string.Suggested fix
- shape = tuple(getattr(node, "shape")) - dtype = getattr(node, "dtype", None) + shape = tuple(node.shape) + dtype = getattr(node, "dtype", None) # dtype may be absent, keep getattr with default🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/convert_openpi_pi05_to_phyai.py` around lines 185 - 186, Replace the getattr calls with direct attribute access since both "shape" and "dtype" are constant attribute names. Change the line `shape = tuple(getattr(node, "shape"))` to `shape = tuple(node.shape)`. For the dtype line using `getattr(node, "dtype", None)`, either change it to direct attribute access `dtype = node.dtype` if the attribute is guaranteed to exist, or use a hasattr check if the attribute may not be present, rather than relying on getattr with a default value for a constant attribute name.Source: Linters/SAST tools
phyai-utils-tools/src/phyai_utils_tools/pipeline/pi05_libero.py (1)
323-327: ⚡ Quick winHarden mean/std unnormalization against stat/action dimensional mismatches.
_unnormalize_action()handles dim mismatch in the quantile path but not in the mean/std path. A checkpoint with padded stats can trigger a broadcast failure here.Proposed fix
mean = self._unnormalizer_stats.get("action.mean") std = self._unnormalizer_stats.get("action.std") if mean is None or std is None: return action - return action * torch.clamp(std.to(action), min=1e-8) + mean.to(action) + mean_t = mean.to(action) + std_t = torch.clamp(std.to(action), min=1e-8) + dim = min(action.shape[-1], mean_t.shape[-1], std_t.shape[-1]) + head = action[..., :dim] * std_t[..., :dim] + mean_t[..., :dim] + if dim == action.shape[-1]: + return head + return torch.cat([head, action[..., dim:]], dim=-1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phyai-utils-tools/src/phyai_utils_tools/pipeline/pi05_libero.py` around lines 323 - 327, The mean/std unnormalization path in the _unnormalize_action() method does not handle dimensional mismatches between the stats tensors and the action tensor, unlike the quantile path. When checkpoint stats are padded, the broadcast operation on the return line can fail. Add dimensional mismatch handling to the mean and std tensors before they are used in the unnormalization calculation (reshape or slice them to match the action tensor dimensions), similar to the pattern used in the quantile unnormalization path within the same method.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@phyai/src/phyai/models/pi05/modeling_pi05.py`:
- Line 603: The docstring for the embed_lang() function contains outdated
information about vision embedding scaling that no longer reflects the current
behavior after the recent changes to embed_scale and removal of vision-side
scaling. Update the docstring to remove or correct the reference to
sqrt(projection_dim) scaling for vision embeddings, ensuring the documentation
accurately describes the current scaling behavior and does not mislead future
modifications.
In `@tools/convert_openpi_pi05_to_phyai.py`:
- Around line 545-549: The temporary file created by mkstemp is not cleaned up
if save_file or os.replace raises an exception, leading to orphaned files
accumulating on disk. Wrap the save_file and os.replace calls in a try-finally
block to ensure the temporary file is properly deleted even when exceptions
occur. In the finally block, check if the temporary file still exists and remove
it using os.remove to guarantee cleanup regardless of success or failure.
---
Duplicate comments:
In `@phyai-utils-tools/tests/test_pi05_libero_pipeline.py`:
- Around line 23-24: The test has a hard-coded machine-specific absolute path
`/mnt/data2/shared_models/pi05_libero_finetuned_v044` for the checkpoint
parameter in the PI05LiberoPipeline instantiation, which makes the test fail on
machines without that mount and breaks in CI environments. Replace this absolute
path with a hermetic approach such as using a pytest fixture that provides a
valid checkpoint path, mocking the checkpoint parameter, using a relative test
resource path, or employing an environment variable with a fallback to test data
bundled with the repository. Ensure the checkpoint assignment on line 23 no
longer references the machine-specific mount path.
---
Nitpick comments:
In `@phyai-utils-tools/src/phyai_utils_tools/pipeline/pi05_libero.py`:
- Around line 323-327: The mean/std unnormalization path in the
_unnormalize_action() method does not handle dimensional mismatches between the
stats tensors and the action tensor, unlike the quantile path. When checkpoint
stats are padded, the broadcast operation on the return line can fail. Add
dimensional mismatch handling to the mean and std tensors before they are used
in the unnormalization calculation (reshape or slice them to match the action
tensor dimensions), similar to the pattern used in the quantile unnormalization
path within the same method.
In `@tools/convert_openpi_pi05_to_phyai.py`:
- Around line 185-186: Replace the getattr calls with direct attribute access
since both "shape" and "dtype" are constant attribute names. Change the line
`shape = tuple(getattr(node, "shape"))` to `shape = tuple(node.shape)`. For the
dtype line using `getattr(node, "dtype", None)`, either change it to direct
attribute access `dtype = node.dtype` if the attribute is guaranteed to exist,
or use a hasattr check if the attribute may not be present, rather than relying
on getattr with a default value for a constant attribute name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b2b82cf2-b245-43ad-9c57-ef2603549d59
📒 Files selected for processing (9)
phyai-utils-tools/pyproject.tomlphyai-utils-tools/src/phyai_utils_tools/pipeline/__init__.pyphyai-utils-tools/src/phyai_utils_tools/pipeline/pi05_libero.pyphyai-utils-tools/src/phyai_utils_tools/pipeline/processors.pyphyai-utils-tools/tests/test_pi05_libero_pipeline.pyphyai/src/phyai/models/pi05/modeling_pi05.pyphyai/src/phyai/policies/__init__.pyphyai/src/phyai/policies/pi05_libero.pytools/convert_openpi_pi05_to_phyai.py
bcf6d97 to
02e940a
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@phyai/src/phyai/policies/pi05_libero.py`:
- Around line 159-170: The camera mode and tokenizer path logic in PI05Libero is
reading PHYAI_CAMERA_MODE and PHYAI_TOKENIZER_PATH directly from os.environ;
move both settings into phyai.env first and then consume that shared env API
from _resolve_camera_names and _resolve_tokenizer_name. Update the policy to use
the project’s environment accessors/constants instead of ad hoc os.environ
lookups so these new PHYAI_* settings are declared centrally in phyai.env.
- Around line 264-268: The camera selection in pi05_libero’s observation lookup
is too permissive because the fallback through
candidates.extend(images.values()) can silently pick an arbitrary camera like
agentview instead of failing when a wrist alias is missing. Update the lookup
logic in the policy code around the candidate resolution to only accept the
explicit aliases from keys (and any direct obs lookup), and remove the catch-all
images.values() fallback so missing expected cameras surface as a clear KeyError
rather than returning the wrong image.
- Around line 86-91: The compat normalization loading in the policy
initialization should fail fast instead of allowing missing stats to fall back
to raw scaling. Update the `_load_processor_state` usage in `PI05LiberoPolicy`
so that when `_use_phyai_compat` is enabled, missing `policy_preprocessor.json`
or `policy_postprocessor.json` tensors trigger an explicit error during setup
rather than continuing with unset normalizer/unnormalizer state; keep the check
close to `_normalizer_stats` and `_unnormalizer_stats` assignment so bad
checkpoint conversions are surfaced immediately.
- Around line 363-370: The prompt-building path in the state discretization
logic currently allows out-of-range bin IDs because `np.digitize(...)-1` can
produce `-1` for values below the lower bound. Update the `discretized` handling
in the `pi05_libero.py` prompt construction flow so the per-sample `state_bins`
are clamped into the valid bin index range before `state_str` is assembled,
keeping the `prompts.append(...)` output free of negative state IDs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 55cbf657-91ab-441f-aaea-cc8373a37aba
📒 Files selected for processing (2)
phyai/src/phyai/policies/__init__.pyphyai/src/phyai/policies/pi05_libero.py
✅ Files skipped from review due to trivial changes (1)
- phyai/src/phyai/policies/init.py
02e940a to
02e4160
Compare
Summary
Validation
Notes
Summary by CodeRabbit
New Features
Bug Fixes