Summary
On AMD Strix Halo (Ryzen AI Max+ 395), Foundry Local can run GPU models successfully, but NPU model load consistently fails and crashes Inference.Service.Agent.exe.
Environment
- OS: Windows 11 host (WSL2 Ubuntu 24.04 used only as operator shell)
- Hardware: AMD Ryzen AI Max+ 395 / Radeon 8060S
- Foundry Local:
Microsoft.FoundryLocal_0.8.119.102_x64__8wekyb3d8bbwe
- NPU EP:
MicrosoftCorporationII.WinML.AMD.NPU.EP.1.8_1.8.62.0_x64__8wekyb3d8bbwe
Reproduction
foundry service status (service running, VitisAI EP listed)
- Run:
foundry model run qwen2.5-0.5b --device NPU --prompt "test" --ttl 120
- Failure happens during load request.
Observed behavior
- CLI stderr includes:
Exception: Request to local service failed. Uri:http://127.0.0.1:<port>/openai/load/qwen2.5-0.5b-instruct-vitis-npu:3?ttl=120
An error occurred while sending the request.
- Application log repeatedly records crash:
- Event ID:
1000
- Faulting app:
Inference.Service.Agent.exe (0.8.119.102)
- Faulting module:
onnxruntime_vitis_ai_custom_ops.dll (1.7.0.0)
- Exception code:
0xc0000005
- Fault offset:
0x00000000000246f5
Control test
GPU path works on same machine:
foundry model run qwen2.5-0.5b --device GPU --prompt "test" --ttl 120
Model loads and returns text successfully.
Mitigations tried
- Service reset/init and smoke retest
- Tested two NPU model variants (
qwen2.5-coder-0.5b, qwen2.5-0.5b)
- Repaired Foundry package (initially blocked by in-use files
0x80073d02, then repair completed after stopping processes)
- Retested after repair: same NPU crash signature remains
Why this seems product/runtime related
CPU/GPU paths are healthy on the same system, but NPU path consistently crashes in onnxruntime_vitis_ai_custom_ops.dll during model load.
Artifacts available
I collected full evidence bundle under:
C:\Users\Ken\foundry_fix_round1\
Key files:
ISSUE_BUNDLE_20260623.md
r2b_01_service_status.txt
r2b_02_versions.txt
r2b_03_npu_smoke_stdout.txt
r2b_03_npu_smoke_stderr.txt
r2b_04_events.txt
r3_06_gpu_smoke_stdout.txt
r3_14_npu_after_repair_stderr.txt
r3_15_events_after_repair.txt
If you want, I can upload zipped artifacts from this folder.
Summary
On AMD Strix Halo (Ryzen AI Max+ 395), Foundry Local can run GPU models successfully, but NPU model load consistently fails and crashes
Inference.Service.Agent.exe.Environment
Microsoft.FoundryLocal_0.8.119.102_x64__8wekyb3d8bbweMicrosoftCorporationII.WinML.AMD.NPU.EP.1.8_1.8.62.0_x64__8wekyb3d8bbweReproduction
foundry service status(service running, VitisAI EP listed)foundry model run qwen2.5-0.5b --device NPU --prompt "test" --ttl 120Observed behavior
Exception: Request to local service failed. Uri:http://127.0.0.1:<port>/openai/load/qwen2.5-0.5b-instruct-vitis-npu:3?ttl=120An error occurred while sending the request.1000Inference.Service.Agent.exe(0.8.119.102)onnxruntime_vitis_ai_custom_ops.dll(1.7.0.0)0xc00000050x00000000000246f5Control test
GPU path works on same machine:
foundry model run qwen2.5-0.5b --device GPU --prompt "test" --ttl 120Model loads and returns text successfully.
Mitigations tried
qwen2.5-coder-0.5b,qwen2.5-0.5b)0x80073d02, then repair completed after stopping processes)Why this seems product/runtime related
CPU/GPU paths are healthy on the same system, but NPU path consistently crashes in
onnxruntime_vitis_ai_custom_ops.dllduring model load.Artifacts available
I collected full evidence bundle under:
C:\Users\Ken\foundry_fix_round1\Key files:
ISSUE_BUNDLE_20260623.mdr2b_01_service_status.txtr2b_02_versions.txtr2b_03_npu_smoke_stdout.txtr2b_03_npu_smoke_stderr.txtr2b_04_events.txtr3_06_gpu_smoke_stdout.txtr3_14_npu_after_repair_stderr.txtr3_15_events_after_repair.txtIf you want, I can upload zipped artifacts from this folder.