vtavakkoli · vtavakkoli · May 7, 2026 · May 7, 2026
diff --git a/Dockerfile b/Dockerfile
@@ -1,7 +1,8 @@
 FROM python:3.12-slim
 
 ENV PYTHONDONTWRITEBYTECODE=1 \
-    PYTHONUNBUFFERED=1
+    PYTHONUNBUFFERED=1 \
+    PYTHONPATH=/app
 
 WORKDIR /app
 
@@ -10,7 +11,11 @@ RUN pip install --no-cache-dir -r /app/requirements.txt
 
 COPY agentic_ran /app/agentic_ran
 COPY scripts /app/scripts
+COPY src /app/src
+COPY models /app/models
+COPY policies /app/policies
+COPY configs /app/configs
 
 RUN mkdir -p /app/results /app/shared_data
 
-CMD ["python", "-m", "scripts.run_scenario", "--scenario", "lightweight-32"]
+CMD ["python", "-m", "src.benchmark", "--benchmark-scope", "main"]
diff --git a/README.md b/README.md
@@ -1,142 +1,42 @@
-# Agentic RAN Traffic-Aware Control Framework
+# Agentic-RAN Benchmark
 
-## Project goal
-This repository now targets an **agentic, traffic-aware RAN control workflow**:
-1. Observe slice-level RAN state from `*_metrics.csv`.
-2. Engineer temporal + traffic-class + schedule-aware context.
-3. Predict throughput (`tx_brate downlink [Mbps]`).
-4. Recommend interpretable control actions (PRB change / scheduler switch).
+This repository benchmarks slice-aware RAN forecasting and safe agentic control.
 
-The proposed method family is **Liquid Dynamics** (represented by `liquid-baseline`) and is benchmarked against:
-- lightweight MLP (`lightweight-32`, `lightweight-64`)
-- balanced MLP (`balanced-small`, `balanced-medium`)
-- deep MLP (`deep-performance`)
-- ultra-performance MLP (`ultra-performance`)
-- attention-based sequence modeling (`attention-baseline`)
-- xLSTM (`xlstm-baseline`)
+## Main benchmark scope
+The default benchmark focuses on:
+- Time-aware tabular/residual forecasting models
+- Strong gradient boosting baseline
+- Graph-aware actor-critic and masked PPO control baselines
+- **SafeGraphAgent-RAN** proposed method
 
-## Why this repository is agentic
-The framework does not stop at KPI prediction. It builds context-aware slice state, selects an action from an explicit action space, and records a human-readable reason and confidence for each decision.
+Appendix temporal models remain available but are excluded from the default main scope.
 
-## Dataset structure and attribution
-### Documented input folder
-Use `dataset/` as the canonical input location for raw CSV data preparation.
-
-Expected layout:
-- `dataset/slice_mixed/**/*_metrics.csv`
-- `dataset/slice_traffic/**/*_metrics.csv`
-
-`prepare_splits.py` accepts `dataset/` directly and recursively scans only `*_metrics.csv` files.
-
-### Tested reference dataset
-Data preparation is tested with the **Colosseum O-RAN COMMAG Dataset** associated with:
-> L. Bonati, S. D'Oro, M. Polese, S. Basagni, T. Melodia, “Intelligence and Learning in O-RAN for Data-driven NextG Cellular Networks,” IEEE Communications Magazine, vol. 59, no. 10, pp. 21–27, October 2021.
-
-Please cite that paper if you use the dataset in a publication.
-
-## Target column and feature handling
-- The prepared benchmark dataset always uses an explicit target column named **`target`**.
-- During preparation, you can explicitly set the raw target column with:
-  ```bash
-  python -m scripts.prepare_splits --target-col "tx_brate downlink [Mbps]"
-  ```
-- If `--target-col` is not set, preparation defaults to `tx_brate downlink [Mbps]` (or `ratio_granted_req` for URLLC experiments via `--target-col ratio_granted_req`).
-- **Actual source feature names are preserved** (no remapping to `feature_0`, `feature_1`, ...).
-
-## Requirements
-- Python **3.12**
-- PyTorch (CPU-compatible build by default in Docker)
-- Docker + Docker Compose
-
-Dependencies are declared in `requirements.txt`.
-
-## Repository structure
-- `agentic_ran/`
-  - `data_loading.py`: dataset loading and fallback behavior
-  - `preprocessing.py`: feature extraction, scaling, sequence building, splitting
-  - `models.py`: model factory and architectures
-  - `training.py`: training loop
-  - `evaluation.py`: metric computation and composite scoring
-  - `reporting.py`: outputs and plots
-  - `scenarios.py`: scenario catalog and hyperparameters
-- `scripts/prepare_splits.py`: raw-data preparation and train/val/test split generation
-- `scripts/run_scenario.py`: run one scenario
-- `scripts/run_all.py`: end-to-end prepare + run + aggregate
-- `scripts/aggregate_report.py`: final benchmark report generation (`results/report.html`)
-
-## Temporal and traffic-aware modeling assumptions
-- Base stations: 4, slices per BS: 3, 15 PRBs (3 MHz).
-- Slice semantics: slice 0 = eMBB, slice 1 = MTC, slice 2 = URLLC.
-- Scheduling policy IDs: 0=RR, 1=WF, 2=PF.
-- Dynamic slice-resizing is encoded through experiment-second phase features.
-
-## Action space (agentic policy)
-- 0 keep_allocation
-- 1/2/3 increase_{embb|mtc|urllc}_prb
-- 4/5/6 decrease_{embb|mtc|urllc}_prb
-- 7/8/9 switch_to_{rr|wf|pf}
-
-> If action labels are used for training, they are pseudo-labels generated by the deterministic rule-based policy and **not operator ground truth**.
-
-## Experiments workflow
-### 1) Prepare data
+## Commands
+### Docker
 ```bash
 docker compose up --build prepare-data
-```
-Equivalent local command:
-```bash
-python -m scripts.prepare_splits \
-  --input-dir dataset/slice_mixed \
-  --input-dir dataset/slice_traffic \
-  --output-dir shared_data/splits
+docker compose up --build benchmark-main
+docker compose up --build benchmark-appendix
+docker compose up --build benchmark-all
+docker compose up --build report
 ```
 
-### 2) Run a single scenario
+### Local CLI
 ```bash
-docker compose up lightweight-32
+python -m src.benchmark --benchmark-scope main
+python -m src.benchmark --benchmark-scope appendix
+python -m src.benchmark --benchmark-scope all
+python -m src.benchmark --benchmark-scope foundation
+python -m src.report
 ```
-or
-```bash
-python -m scripts.run_scenario --scenario lightweight-32
-```
-
-### 3) Run complete benchmark
-```bash
-docker compose up --build run-all
-```
-
-### 4) Generate agentic decisions for one scenario
-```bash
-python -m scripts.run_scenario --scenario agentic_residual_mlp --use-action-head
-```
-Outputs:
-- `results/<scenario>/agentic_decisions.csv`
-- `results/<scenario>/agentic_summary.json`
 
-## Reproducibility guidance
-- Splits are time-aware and chronological per file (60/10/30 train/val/test).
-- Reuse the same files in `shared_data/splits/` across scenario runs.
-- Pin epochs with `EPOCHS=<N>` when comparing architectures.
-- Preserve run artifacts under `results/<scenario>/` (metrics, metadata, predictions, training logs, plots).
-- Track `shared_data/splits/summary.json` to capture file provenance, source target columns, and selected feature names.
+## Outputs
+- `results/main_benchmark.csv`
+- `results/appendix_benchmark.csv`
+- `results/model_ranking.csv`
+- `results/control_ranking.csv`
+- `results/safegraphagent_ran_metrics.csv`
+- `results/report.html`
 
-## Final report interpretation
-The report in `results/report.html` includes cumulative and per-metric views.
-
-- **Higher is better**: `R2`, `composite_score`
-- **Lower is better**: `RMSE`, `MAE`, `MAPE`, `sMAPE`, `wMAPE`
-
-The current benchmark report ranks **`liquid-baseline`** as best under cumulative composite score.
-However, you should also inspect individual metrics separately because other baselines may win on specific metrics (e.g., stronger `R2` or lower `RMSE`).
-
-## License
-This project is licensed under the **MIT License**. See `LICENSE`.
-
-
-## New benchmark scopes
-```bash
-docker-compose up --build benchmark-main
-docker-compose up --build benchmark-appendix
-docker-compose up --build benchmark-all
-docker-compose up --build report
-```
+## Scientific note
+Pseudo-label action metrics are not enough to prove real control quality. Use offline reward proxies, safety fallback behavior, and constraint adherence for control assessment.
diff --git a/configs/benchmark_models.yaml b/configs/benchmark_models.yaml
@@ -1,5 +1,31 @@
-{
-  "main_models": ["mlp_lightweight_32","mlp_balanced_small","mlp_with_time_features","residual_mlp_128","kan_baseline","gradient_boosting_baseline","agentic_residual_mlp","graph_actor_critic_ran","masked_graph_ppo_ran","safegraphagent_ran"],
-  "appendix_models": ["attention_baseline","liquid_baseline","xlstm_baseline","residual_tcn_16","residual_tcn_32","residual_liquid_tcn_16","residual_liquid_tcn_32","patchtst_baseline","tsmixer_baseline","agentic_liquid_residual","agentic_sequence_attention","agentic_patch_kan_mixer"],
-  "optional_foundation_models": ["chronos_bolt","timesfm","tiny_time_mixer","moirai"]
-}
+main_models:
+  - mlp_lightweight_32
+  - mlp_balanced_small
+  - mlp_with_time_features
+  - residual_mlp_128
+  - kan_baseline
+  - gradient_boosting_baseline
+  - agentic_residual_mlp
+  - graph_actor_critic_ran
+  - masked_graph_ppo_ran
+  - safegraphagent_ran
+
+appendix_models:
+  - attention_baseline
+  - liquid_baseline
+  - xlstm_baseline
+  - residual_tcn_16
+  - residual_tcn_32
+  - residual_liquid_tcn_16
+  - residual_liquid_tcn_32
+  - patchtst_baseline
+  - tsmixer_baseline
+  - agentic_liquid_residual
+  - agentic_sequence_attention
+  - agentic_patch_kan_mixer
+
+optional_foundation_models:
+  - chronos_bolt
+  - timesfm
+  - tiny_time_mixer
+  - moirai
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -4,123 +4,42 @@ x-common: &common
   volumes:
     - ./agentic_ran:/app/agentic_ran
     - ./scripts:/app/scripts
+    - ./src:/app/src
+    - ./models:/app/models
+    - ./policies:/app/policies
+    - ./configs:/app/configs
     - ./shared_data:/app/shared_data
     - ./results:/app/results
   environment:
     EPOCHS: ${EPOCHS:-5}
 
 services:
-
   prepare-data:
     <<: *common
     command: python -m scripts.prepare_splits --input-dir dataset/slice_mixed --input-dir dataset/slice_traffic --output-dir shared_data/splits
     volumes:
       - ./agentic_ran:/app/agentic_ran
       - ./scripts:/app/scripts
       - ./dataset:/app/dataset
-      - ./shared_data:/app/shared_data
-      - ./results:/app/results
-
-  lightweight-32:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario lightweight-32
-
-  lightweight-64:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario lightweight-64
-
-  balanced-small:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario balanced-small
-
-  balanced-medium:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario balanced-medium
-
-  deep-performance:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario deep-performance
-
-  ultra-performance:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario ultra-performance
-
-  attention-baseline:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario attention-baseline
-
-  liquid-baseline:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario liquid-baseline
-
-  xlstm-baseline:
-    <<: *common
-    depends_on:
-      - prepare-data
-    command: python -m scripts.run_scenario --scenario xlstm-baseline
-
-  aggregator:
-    <<: *common
-    command: python -m scripts.aggregate_report
-    depends_on:
-      - lightweight-32
-      - lightweight-64
-      - balanced-small
-      - balanced-medium
-      - deep-performance
-      - ultra-performance
-      - attention-baseline
-      - liquid-baseline
-      - xlstm-baseline
-
-  run-all:
-    <<: *common
-    command: python -m scripts.run_all
-    volumes:
-      - ./agentic_ran:/app/agentic_ran
-      - ./scripts:/app/scripts
-      - ./dataset:/app/dataset
-      - ./shared_data:/app/shared_data
-      - ./results:/app/results
-
-  full-run:
-    <<: *common
-    command: python -m scripts.run_all
-    volumes:
-      - ./agentic_ran:/app/agentic_ran
-      - ./scripts:/app/scripts
-      - ./dataset:/app/dataset
+      - ./src:/app/src
+      - ./models:/app/models
+      - ./policies:/app/policies
+      - ./configs:/app/configs
       - ./shared_data:/app/shared_data
       - ./results:/app/results
 
   benchmark-main:
     <<: *common
     command: python -m src.benchmark --benchmark-scope main
 
-  benchmark-all:
-    <<: *common
-    command: python -m src.benchmark --benchmark-scope all
-
   benchmark-appendix:
     <<: *common
     command: python -m src.benchmark --benchmark-scope appendix
 
+  benchmark-all:
+    <<: *common
+    command: python -m src.benchmark --benchmark-scope all
+
   report:
     <<: *common
     command: python -m src.report
diff --git a/requirements.txt b/requirements.txt
@@ -2,3 +2,4 @@ numpy>=1.26
 pandas>=2.2
 matplotlib>=3.8
 torch>=2.3
+
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,3 +2,4 @@ numpy>=1.26
		pandas>=2.2
		matplotlib>=3.8
		torch>=2.3