Skip to content

parallel agentic calls not supported by server #79

@pastoriomarco

Description

@pastoriomarco

Thanks for landing this in 0.7.0. I tested it on Jetson AGX Thor with nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 and single-stream throughput was excellent (~1.8× than same model in vllm), plus load times are much faster after first run.

Two observations from the quick tests I ran:

The server handles one request at a time. Sending two concurrently triggers an error, and the process needs a restart to recover.
Responses seem to return text only. The OpenAI-standard tool_calls field doesn't seem to be populated yet, I rely on that for agentic workflows.
Let me know if there's something I might be missing that could cause these problems if they are known and to be solved.
I can provide my setup if useful

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions