Skip to content

Surrogate model + sensitivity analysis over sim_config parameter space #78

Description

@biosynthart

Goal

Train a surrogate model that predicts ecosystem outcomes from config, and use it to identify which of the ~48 tunable parameters actually matter. This enables cheap exploration of the parameter space without running the full physics simulation.

Background

The ecosim engine has ~299 constants across 8 source files, but only ~48 are truly tunable (38 in sim_config.json + 10 world rates). Many may be dead weight — they only matter in edge cases that rarely trigger. Sensitivity analysis tells us which params to expose and which to freeze.

See: docs/ECOSIM_PARAMETER_TELEMETRY_SPACE.md § "Concrete Use Cases A+C"

Two Sub-Tasks

1. Surrogate Model

  • Inputs: ~48 tunable params + species trait vectors + biome ID (one-hot or embedding)
  • Targets: aggregate metrics at T=1000, T=5000, T=10000:
    • Biodiversity index (species still alive / initial count)
    • Time-to-collapse (tick when population drops below threshold)
    • Carrying capacity (peak population per species)
    • Event rate summary (deaths, reproductions, predations normalized by tick count)
  • Model: MLP or Gaussian Process — input space is small (~50D), output is ~10 metrics
  • Training data: 1000–10000 runs via Sobol/Latin hypercube sampling (from Telemetry emitter: config snapshot + time-series aggregates + event batching #77 batch runner)

2. Sensitivity Analysis

  • Run Sobol or Morris screening over the tunable space
  • Train random forest on config → outcome, extract feature importances
  • Produce a ranked list: which params have highest total-effect index?
  • Result: prune sim_config.json down to the params that actually matter

Module Structure

scripts/train_surrogate.py          # train MLP/GP on telemetry data
scripts/sensitivity_analysis.py     # Sobol/Morris screening + feature importance plot
server/ecosim/metrics.py            # outcome metric functions (biodiversity, collapse time, etc.)

Design Constraints

Acceptance Criteria

  • scripts/train_surrogate.py loads telemetry JSONL, trains MLP, saves model
  • Surrogate predicts ecosystem outcomes within reasonable error of physics runs (<20% MAE on key metrics)
  • scripts/sensitivity_analysis.py produces ranked feature importance plot
  • server/ecosim/metrics.py defines reusable outcome metric functions
  • Document which params are high-importance vs. dead-weight in the doc

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    shelvedDeferred until after scalability work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions