Introduce benchmark CLI, model registry, safe-control components, Docker/Compose updates, and tests by vtavakkoli · Pull Request #45 · vtavakkoli/Agentic-RAN

vtavakkoli · 2026-05-07T06:13:19Z

Motivation

Pivot the repository from scenario-run scripts to a unified benchmarking workflow focused on main vs appendix model scopes and safe agentic control evaluation.
Provide a compact, testable suite of model interfaces, lightweight model stubs, and a safe policy layer to enable reproducible offline control and forecasting comparisons.
Update container orchestration and entrypoints so Docker-driven workflows run the new benchmark/report pipeline by default.

Description

Add a new src package with benchmark.py, ranking.py, and report.py to load model lists from configs/benchmark_models.yaml, synthesize mock metrics, produce CSV leaderboards, and render a simple HTML report, and wired the new CLI as the Docker/Compose entrypoint via CMD and service commands.
Create configs/benchmark_models.yaml to declare main, appendix, and optional_foundation_models groups and add many model stubs/implementations under models/ (e.g. gradient_boosting.py, graph_actor_critic_ran.py, masked_graph_ppo_ran.py, safegraphagent_ran.py, base.py) plus a policies/safe_policy_layer.py implementing a safe fallback enforcement function.
Update Dockerfile and docker-compose.yml to copy new folders (src, models, policies, configs) and expose new compose targets benchmark-main, benchmark-appendix, benchmark-all, and report; add PYTHONPATH=/app in the image.
Refresh README.md to describe the new benchmark scope, commands (python -m src.benchmark --benchmark-scope main), outputs, and scientific notes about pseudo-labels.

Testing

Ran the unit test suite with pytest -q covering tests/test_benchmark_scope.py, tests/test_gradient_boosting_baseline.py, tests/test_report_sections.py, and tests/test_safe_policy_layer.py and the tests passed.
Unit tests validate load_models scoping, GradientBoostingBaseline.fit/predict basic behavior, report HTML sections via src.report.main, and fallback behavior of SafePolicyLayer.enforce.

Codex Task

…nchmark-3xah8h

chatgpt-codex-connector · 2026-05-07T06:15:39Z

💡 Codex Review

Agentic-RAN/models/safegraphagent_ran.py

Lines 14 to 16 in 4378282

    
           self.actor = nn.Linear(hidden, num_actions) 
        
           self.critic = nn.Linear(hidden, 1) 
        
           self.safe = SafePolicyLayer()

Implement forward pass in SafeGraphAgentRAN

SafeGraphAgentRAN subclasses nn.Module but only defines __init__, so invoking the model (e.g., during training/inference) raises NotImplementedError because no forward method exists. Since this model is included in main_models, any pipeline that tries to execute it will fail at runtime instead of producing control outputs.

Agentic-RAN/models/masked_graph_ppo_ran.py

Line 12 in 4378282

    
           "safe_fallback_rate": sum(int(a == (max(actions) if actions else 0)) for a in actions) / n,

Compute safe fallback rate against fixed safe action

offline_policy_eval currently defines safe_fallback_rate as the fraction of actions equal to max(actions), which measures the most frequent high-index action in that batch rather than the actual safe-fallback action. For traces with no fallback action (e.g., actions [0,1,2,1]), this still reports a nonzero fallback rate, corrupting safety metrics and downstream ranking/report conclusions.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Fix benchmark module packaging and streamline benchmark docs/services

4378282

vtavakkoli added the codex label May 7, 2026 — with ChatGPT Codex Connector

Merge branch 'main' into codex/clean-repository-and-implement-main-be…

e91a308

…nchmark-3xah8h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce benchmark CLI, model registry, safe-control components, Docker/Compose updates, and tests#45

Introduce benchmark CLI, model registry, safe-control components, Docker/Compose updates, and tests#45
vtavakkoli wants to merge 2 commits into
mainfrom
codex/clean-repository-and-implement-main-benchmark-3xah8h

vtavakkoli commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vtavakkoli commented May 7, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector Bot commented May 7, 2026

💡 Codex Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant