Skip to content

YAML: fail-fast schema validation for agents/tasks config with actionable errors (+ praisonai validate) #2125

Description

@MervinPraison

Summary

The YAML configuration path — a primary way users define agents/tasks — has only shallow validation: it warns on misspelt agent field names and nothing else. Missing required fields, wrong value types, unknown tool references, and tasks pointing at non-existent agents are not caught at load time; they surface as confusing failures partway through execution (often after an LLM call has already been billed). For a YAML-first, production-oriented framework this is a robustness and "simpler to debug" gap.

Current behaviour

  • AgentsGenerator._validate_agents_config() (src/praisonai/praisonai/agents_generator.py) only iterates agents/roles, compares each field name against a hardcoded known_fields set, and logs a warning with a difflib suggestion. It does not:
    • validate tasks at all (only agents/roles),
    • check required fields are present,
    • check value types,
    • verify each task's agent references a defined agent,
    • verify referenced tools resolve (the ToolResolver chain exists but is not consulted during validation),
    • fail — everything is a non-fatal warning, so a broken config proceeds into execution.
  • There is no validate/lint command in the CLI command tree (cli/app.py).
  • Config is loaded with a bare yaml.safe_load (agents_generator.py:_load_config); structural errors are discovered late.

Desired behaviour

  • Validate the full config (agents/roles and tasks/workflow) against a schema at load time and fail fast with a precise, actionable message: file, section, key, what is wrong, and how to fix it.
  • Validate cross-references: every tasks[].agent resolves to a defined agent; every tool name resolves through the existing ToolResolver; flag unknown keys as errors (configurable to warn for forward-compat).
  • Add praisonai validate <file.yaml> (and --strict) to check a config without running it — ideal for pre-commit/CI.
  • Keep it backward compatible: unknown-field handling defaults to a warning during a deprecation window before becoming a hard error, consistent with the project's deprecation policy.

Proposed approach

  • Define Pydantic models (or a JSON Schema) describing the agents/tasks/workflow config shape; reuse existing config dataclasses where possible to stay DRY.
  • Replace the warning-only _validate_agents_config with a validator that returns structured errors and raises a single aggregated, well-formatted error (list all problems at once, not one-at-a-time).
  • Wire cross-reference checks to the existing ToolResolver and the agent registry already built during generation.
  • Add cli/commands/validate.py registered in the lazy command table.

Layer placement

  • Primary layer: wrapper (src/praisonai/praisonai/)
  • Why not core: YAML is a wrapper concept (the resolver/agents_generator live in the wrapper); the protocol-driven core consumes already-constructed Python objects, not YAML, and must stay free of config-format logic.
  • Why not tools: validation is not an agent-callable integration.
  • Why not plugins: it is not a runtime lifecycle hook/guardrail; it runs at config load before any agent executes.
  • Secondary touch (optional): reuse existing core config dataclasses/Pydantic models as the schema source of truth to avoid duplication.
  • 3-way surface (CLI + YAML + Python): yes — CLI praisonai validate file.yaml; YAML is the validated artefact; Python PraisonAI(...).validate() / a validate_config(dict) helper.

Resolution sketch

agents_generator.py           # replace warning-only validator with fail-fast, aggregated errors
config/schema (wrapper)       # Pydantic/JSON-Schema for agents+tasks+workflow
cli/commands/validate.py      # `praisonai validate <file> [--strict]`
cli/app.py                    # register "validate" in _LAZY_COMMANDS
tool_resolver.py              # consulted to verify referenced tools resolve

Severity

Medium — does not block valid configs, but turns a class of late, opaque mid-run failures into immediate, actionable load-time errors, materially improving production robustness and debuggability.

Validation

  • A config with a missing required field, a wrong type, a task referencing an undefined agent, and an unknown tool fails at load with one aggregated message listing all four problems and fixes — before any LLM call.
  • praisonai validate good.yaml exits 0; praisonai validate broken.yaml exits non-zero with the same actionable output.
  • Real agentic test: a valid YAML still runs end-to-end — praisonai run agents.yaml calls the LLM and produces output.
  • Backward compatibility: existing valid configs continue to load; unknown-field handling honours the deprecation window (warn → error).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclaudeAuto-trigger Claude analysis

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions