[WIP] Azure AI Agents: add next-step guidance and doctor diagnostics by antriksh30 · Pull Request #8198 · Azure/azure-dev

antriksh30 · 2026-05-15T04:00:03Z

Summary

add context-aware Next: guidance for the Azure AI Agents extension across init, run, invoke, show, deploy-hook, and doctor flows
add azd ai agent doctor with local and remote checks for project setup, environment variables, authentication, Foundry reachability/RBAC, hosted agent status, identity roles, model deployments, connections, and toolboxes
improve post-deploy guidance with README-first sample payload hints, omit active-agent show next steps, and stream text-mode doctor checks as they complete

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add the foundation for context-aware `Next:` guidance described in PR Azure#8057: - New `internal/cmd/nextstep` package with `Suggestion`, `State`, `ServiceState`, `AuthState` types and a format-agnostic `PrintNext` writer that aligns commands on the longest entry and caps output at one primary + one secondary line. - Add an `isTerminal(fd uintptr) bool` helper in `internal/cmd/helpers.go` wrapping `golang.org/x/term`; promote that module from indirect to direct in `go.mod`. - Register `nextstep` in the repo cspell dictionary. No callers yet; resolvers, state assembly, and command wiring land in subsequent commits. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Introduces nextstep.AssembleState, a single best-effort probe of the current azd world that the resolver (next commit) will read from. It captures three things the design relies on: 1. Whether AZURE_AI_PROJECT_ENDPOINT is set in the active environment (HasProjectEndpoint). 2. The agent services declared in azure.yaml, in alphabetical order (Services). 3. For each service, whether azd recorded a successful deploy. The signal is AGENT_<KEY>_VERSION non-empty in the env, matching the convention written by registerAgentEnvironmentVariables in service_target_agent.go. KEY is derived via the same spaces+hyphens-to-underscore upper-case transform getServiceKey uses (lines 222-226 of service_target_agent.go). Probes are best-effort: transport errors are collected and returned alongside a partial State so resolvers can still degrade gracefully (e.g., suggest azd init when project load fails). A small Source interface decouples the assembler from *azdext.AzdClient so tests can be hand-rolled fakes; production wraps the real client via NewSource. WithAuthProbe / WithOpenAPIProbe options are plumbed but inert until commit 1.3 / 1.4 land keeps the public API stable from day one so callers and tests don't need rewriting later. Plan refs closed: D4 (IsDeployed rule). Closes the data-gathering half of Phase 1 commit 1.2. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AssembleState's collectServices iterated every azure.yaml service, so a project with mixed hosts (e.g. one agent + one containerapp web tier) would have leaked the web service into nextstep's view and triggered spurious AGENT_<KEY>_VERSION env lookups for it. Filter at the boundary on Host == agentHost (mirrors the cmd.AiAgentHost literal; intentional duplication to keep nextstep importable from cmd without a cycle). Tests: existing fixtures updated to use the canonical host; new 'non-agent services are filtered out' case pins the behavior; TestAgentHostConstant pins the literal to guard against drift. Resolves the F1 finding from the cross-pollinated review of 5ab18b7 (3/3 reviewer consensus). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…error vocabulary Phase 1 commit 1.3. Three pure-Go source files plus tests, all under `internal/cmd/nextstep/`. No callers yet; nothing prints. Wires the remaining "decide what to print" machinery so Phase 2 commits can swap out the hardcoded hint blocks in init/run/invoke/show/deploy. resolver.go Pure decision functions over *State, one per command outcome: - ResolveAfterInit - ResolveAfterRun - ResolveAfterInvoke (success + typed failure) - ResolveAfterShow - ResolveAfterDeploy Filesystem and OpenAPI-cache access flow through caller-injected closures (cachedPayload, readmeExists) so the resolver stays pure and unit-testable. No I/O, no globals. openapi.go - ExtractInvokeExample(spec []byte) string: walks paths./invocations.post.requestBody.content.application/json with explicit $ref short-circuit at both requestBody and schema levels. Resolution order: content.example -> schema.example -> generated from required+properties[*].example -> "". Silent on any miss. - ReadCachedOpenAPISpec(configDir, agentName, suffix): mirrors the writer-side path shape from helpers.go (fetchOpenAPISpec) so the two stay in lockstep. Returns (nil, nil) on os.ErrNotExist. error_codes.go Typed wire-level vocabulary, sourced verbatim from the vienna platform's authoritative enums: - UserErrorCode (HostedAgentVersionManager.cs) - SessionErrorCode (Session/Exceptions/SessionErrorCode.cs) - AgentVersionStatus (Contracts/V2/Generated/Agents/.../...Status.cs) Plus RemediationForUserErrorCode / RemediationForSessionErrorCode helpers returning the platform's troubleshooting URL + suggestion text. Surfaces codes verbatim; no re-classification. The platform appends its own aka.ms TSG link via WithTroubleshootingInfo, so the extension just passes Code + Message through. Strategy delta D5 (will be recorded in STRATEGY-DELTA.md): the plan assumed cache path .azd/agent-cache/<env>-<agent>-openapi.json; the actual writer in helpers.go:317-374 uses <configDir>/openapi-<safeName>-<suffix>.json where safeName runs strings.ReplaceAll on "..", "/", and "\\". The reader mirrors that shape byte-for-byte so the two halves never drift. Tests cover every branch in every resolver, every $ref-short-circuit path in the extractor, the writer/reader sanitization contract, every remediation arm in the error_codes mapping, and pin every wire-level string against the platform contract (so a typo in a Go const can't silently diverge from what the service emits). Closes plan items C5, C11 (foundation). Sets up the Phase 2 callers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three findings emerged from the 3-model code review of commit 0b39575 (Opus 4.7 xhigh, Sonnet 4.6, GPT-5.5) and were corroborated via cross-pollination across the reviewers. Three were adopted; one was dropped after the author empirically tested the affected shell. F-A: shell-escape single quotes in OpenAPI-derived payloads. resolver.go lines 104 and 313 wrapped state.OpenAPIPayload / cached payload in single quotes via raw concatenation. The payload comes from json.Marshal in ExtractInvokeExample, which does not escape apostrophes, so an OpenAPI example such as {"q":"don't"} terminated the surrounding single-quoted shell argument and broke the suggested invoke command. Introduce shellEscapeSingleQuoted using the POSIX '\'' idiom and route both sites through it. Cross-pollinated: 3 of 3 reviewers concurred. F-B: honor ServiceState.Protocol in ResolveAfterShow Active branch. The Active case unconditionally passed ProtocolResponses to invokeCommandFor, so an invocations-protocol agent was suggested the responses-style "Hello!" payload (which the agent rejects). Look up the service via findService and default to ProtocolResponses only on miss. Existing test asserted only a substring containing "azd ai agent invoke echo", which passed for either payload that is why the bug slipped past code review on 1.3. Replace the substring assertion with exact matches and add explicit subtests for invocations vs responses. Cross-pollinated: 3 of 3 concurred. F-D: populate ServiceState.Protocol from agent.yaml in collectServices. The Protocol field was declared in types.go but never written by the production code path, so F-B's lookup would have silently fallen back to ProtocolResponses for every agent in real use. Add loadServiceProtocol(projectPath, relativePath) that reads <root>/<rel>/agent.yaml, parses agent_yaml.ContainerAgent, and picks ProtocolResponses when declared (broadest compatibility), ProtocolInvocations when only invocations is declared, or "" on any error. All failure modes are silent the resolver degrades to responses-default rather than surfacing transient I/O errors through the next-step hint. Cross-pollinated: Opus, Sonnet, and GPT-5.5 all confirmed the field was production-dead. F-C dropped: bash !" history expansion. Sonnet flagged that "Hello!" would trigger bash history expansion. Opus empirically refuted by running bash 5.1.16: !" is not a history designator and bash leaves it literal. GPT-5.5 confirmed on cross-pollination. No change. Tests: TestResolveAfterRun gains an apostrophe-in-payload case. TestResolveAfterDeploy gains an apostrophe-in-payload case. TestResolveAfterShow Active row split into an explicit substring assertion plus three subtests asserting protocol-driven payload selection. TestLoadServiceProtocol covers single/multi/empty/malformed manifests and missing files. TestAssembleState_PopulatesProtocolFromAgentYaml exercises the end-to-end path on a temp dir. No user-visible change yet; resolvers remain wired only to themselves. Phase 2 will surface the corrected suggestions to real users when init.go is the first caller. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous doc comment named the POSIX escape idiom literally using backtick-delimited examples that included backslash-apostrophe sequences. Those byte sequences proved fragile through PowerShell heredoc / editor format-on-save round-trips, and ended up showing U+201D smart-quotes in the committed file instead of the intended ASCII characters. A user reading the comment would also have been misled: the names given (after the smart-quote substitution) did not match what the function actually emits on line 397. Rewrites the comment in prose, anchoring the byte-pattern reference to the implementation line (which uses a Go raw string so the literal cannot be mangled). Also restates the PowerShell adaptation guidance in terms of PowerShell's own two-consecutive-apostrophes convention instead of referencing the POSIX byte pattern. 3-of-3 reviewer consensus on the underlying finding (Sonnet flagged the original; Opus and GPT-5.5 cross-pollinated confirmation). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…uoted doc The previous doc rewrite pointed at "line 397" for the byte pattern, but in the committed file line 397 is mid-paragraph prose about json.Marshal. The actual implementation line moved to 404 once the prose rewrite expanded the comment by six lines. A reader following the cross-reference would land in the wrong place. Drops the line-number reference in favor of "the implementation below uses a Go raw string for that sequence so its byte pattern is stable across edits." Hard-coded line numbers inside the same file are inherently fragile and should be avoided. 3-of-3 reviewer consensus on the stale-reference finding: GPT-5.5 and Opus and Sonnet independently flagged it on the 787145a review pass. Fix mirrors what all three reviewers suggested. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…API probe Refactors fetchOpenAPISpec so callers control the "OpenAPI spec saved to %s" output, and wires the previously-placeholder WithOpenAPIProbe option in the nextstep package to actually populate State.HasOpenAPI / OpenAPIPayload from the on-disk cache the invoke flow writes. Closes critique items C5 (silent fetch) and C6 (probe wiring) from the implementation plan. No user-visible behavior change in this commit; the "OpenAPI spec saved to ..." line still surfaces on fresh writes from invoke, and stays silent on cache hits and errors. helpers.go - fetchOpenAPISpec now returns (specFile string, fresh bool). fresh==true means this call wrote a new spec to disk; fresh==false means cache hit OR any failure. Callers print the "saved to" line gated on fresh; future callers (doctor, run-time probe) that want silence simply ignore the bool. The print is no longer inside the helper. invoke.go - Both call sites (local fresh fetch, remote conditional fetch) now emit the "OpenAPI spec saved to %s" line themselves via the (path, fresh) return. Behavior is byte-identical to before; only the ownership of the print moved. nextstep/state.go - WithOpenAPIProbe(enabled bool) becomes WithOpenAPIProbe(agentName, suffix string). Empty agentName or suffix disables the probe (the zero value). - assembleState now runs a strictly cache-only OpenAPI lookup when the probe is enabled and the project + env name are both known. configDir is computed as filepath.Join(project.Path, ".azure", envName) the same directory fetchOpenAPISpec writes into, so reader and writer stay in lockstep without an extra round-trip to the gRPC source. Cache miss, malformed spec, no extractable payload all silently leave HasOpenAPI=false and the resolver falls back to the protocol-generic <payload> literal. nextstep/state_test.go - TestOptionsApplyCleanly updated for the new WithOpenAPIProbe shape. - TestWithOpenAPIProbe_EmptyArgsDisableProbe pins the disabled-default semantics (empty agentName / suffix means probe is off). - TestAssembleState_WithOpenAPIProbe_PopulatesPayloadFromCache exercises the happy path: a real on-disk spec under .azure/<env>/ produces a populated State.OpenAPIPayload via ExtractInvokeExample. - TestAssembleState_WithOpenAPIProbe_MissingCacheLeavesPayloadUnset pins the cache-miss fallback. - TestAssembleState_WithOpenAPIProbe_DisabledWhenAgentEmpty proves an on-disk cache is ignored when the option is called with an empty agentName, so callers can centrally disable the probe. Records strategy delta D9 (fetchOpenAPISpec silencing shape) and D10 (WithOpenAPIProbe shape) in .tmp/pr-8057/STRATEGY-DELTA.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…assembly `State.MissingInfraVars` / `State.MissingManualVars` were declared in commit 1.2 but never populated; the resolver branches in commit 1.3 that consume them only ever saw nil slices. This commit adds the detection step inside `assembleState` so the resolver can suggest the right next action when the user has unprovisioned `${VAR}` references in any agent.yaml. What the helper does - For every azure.ai.agent service in `azure.yaml`, opens the matching `<projectPath>/<svc.RelativePath>/agent.yaml` and walks the `environment_variables` block. - Extracts unique `${VAR}` references via a small package-level regex (`envVarRefPattern`). The optional `(?::-[^}]*)?` non-capturing tail tolerates POSIX-style defaults like `${VAR:-fallback}` without pulling them into the captured name. - Looks each name up against the current azd environment. Names whose value is set are skipped. Names whose value is unset get partitioned: - leading `AZURE_` -> `MissingInfraVars` (`azd provision` outputs in the AI Foundry templates uniformly start with this prefix: `AZURE_AI_*`, `AZURE_OPENAI_*`, `AZURE_SUBSCRIPTION_*`, etc.) - everything else -> `MissingManualVars` (`azd env set` candidates) - Results are deduplicated cross-service (so two services referencing `${AZURE_AI_PROJECT_ENDPOINT}` collapse to one entry) and returned sorted ascending, matching the existing `slices.Sorted` style. Error / partial-state behavior - agent.yaml read or parse errors are silent (return nil refs). The resolver falls back to its default branch rather than emitting guidance about variables we cannot prove are needed. - `src.EnvValue` transport errors append to `*errs` so the snapshot caller can surface them in --debug output, but never abort. This mirrors the existing `isDeployed` contract. - `detectMissingVars` is only invoked when both `project != nil` and `envName != ""`; otherwise both lists stay nil and the existing resolver code paths are unaffected. Why classification is `AZURE_` prefix only The heuristic is intentionally coarse. Documented in the helper godoc: misclassifying a manual var as infra at worst points the user at `azd provision` instead of `azd env set`; the inverse still yields an actionable hint. A future commit can swap in a richer rule (consult `main.bicep` outputs, project-level allow-list) without touching the public API of `AssembleState`. Why split this from the init.go wiring (commit 2.2) The resolver's "no MissingVars" branch suggests `azd ai agent run`, which fails for an unprovisioned env. Wiring init.go without first populating MissingVars would be a behavior regression versus the old hardcoded `azd up` hint. Splitting also keeps each commit reviewable in isolation: 2.1 is pure state-assembly logic with no command wiring, 2.2 is a small swap-in at the call site. Tests added in state_test.go - TestExtractAgentYamlEnvRefs: table with 7 cases covering bare refs, defaulted refs, multiple-refs-per-value, cross-value dedupe, no env block, literal-only values, malformed YAML. - TestExtractAgentYamlEnvRefs_MissingFileOrArgs: empty args + missing manifest all return nil. - TestAssembleState_PopulatesMissingVars: end-to-end via assembleState with a real agent.yaml fixture mixing set + unset infra + manual vars. - TestAssembleState_MissingVarsDedupedAcrossServices: two services with overlapping refs collapse to one entry each list. - TestAssembleState_AllVarsSetLeavesMissingEmpty: regression guard for the "everything provisioned" path. - TestAssembleState_MissingVarTransportErrorSurfaced: EnvValue errors propagate to errs slice without crashing or mis-populating. No production caller of `AssembleState` exists yet, so runtime behavior is unchanged. Commit 2.2 swaps init.go to call the resolver, at which point the populated state takes effect. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tection Before this change, an agent.yaml ref written as `${VAR:-fallback}` would classify VAR as missing whenever it was unset in the azd environment, and the resolver would prompt the user to `azd provision` or `azd env set` it. That hint is misleading: the deploy-time expander (drone/envsubst, used by service_target_agent.go) honors the `:-` default, so the deploy succeeds with the fallback value and the user has no real action to take. Fix: make the regex's default-tail group capturing (`(:-[^}]*)?`) and skip matches where group 2 is non-empty. Bare `${VAR}` still surfaces as missing when unset, matching the runtime requirement. Bare-dash `${VAR-fallback}` (POSIX "if unset, use fallback") continues to be silently dropped — its deploy-time semantics also carry a fallback, so the same user-visible result holds. Tests: * `TestExtractAgentYamlEnvRefs` table: rename + flip "reference with default tail captured as bare name" -> "reference with default tail is skipped" (want: nil). Add "bare ref alongside defaulted ref returns only the bare one". * New `TestAssembleState_DefaultedRefsAreExcludedFromMissingVars` end- to-end: agent.yaml mixes one bare unset ref (must surface) with two defaulted unset refs (must NOT surface, including the manual-vars bucket). Confirms the partition stays correct when only AZURE_AI_ refs would have surfaced through the infra heuristic. Reviewer consensus (2/3): Sonnet's option (b) — drop the regex-broadening half of its companion finding and keep this change minimal. GPT-5.5 originated the misleading-hint observation; Sonnet cross-pollinated and recommended this exact path. Opus REJECTed with the position that the deploy-time hint is "wrong but right" (template intent), which holds for template-supplied AZURE_ refs but breaks for manual vars such as `${MY_API_KEY:-dev-fallback}`. Tie-breaker: the manual-vars case. Verified clean against gofmt, go vet, go build, go test ./internal/cmd/nextstep/..., golangci-lint, cspell, copyright-check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace the hardcoded `azd up` / `azd deploy <svc>` conditional at init.go:1592-1607 with a call to nextstep.AssembleState + ResolveAfterInit + PrintNext. The resolver inspects the active azd environment plus each azure.ai.agent service's agent.yaml to emit context-aware guidance: - MissingInfraVars -> `azd provision` + trailing `azd deploy` - MissingManualVars -> up to 3 `azd env set <KEY> <value>` lines - clean -> `azd ai agent run` + trailing `azd deploy` First user-visible behavior change in this PR. The legacy AZURE_AI_PROJECT_ID dichotomy is replaced by the more informative missing-vars partition; the new trailing line is the generic `azd deploy` (no service-name suffix) per the design spec. State-assembly errors are intentionally ignored: the resolver degrades gracefully on partial state per the design spec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…follow-up nudges 3-of-3 reviewer consensus on commit 077c550 surfaced that PrintNext silently truncates ResolveAfterInit's trailing `azd deploy` line when there are 2+ missing manual vars. The resolver assigns the trailing nudge Priority 90 but the renderer sorts ascending and caps at maxRendered=2; once the manual-vars branch emits 2 or 3 `azd env set` lines (priorities 20-22) the deploy nudge is the first thing dropped. Fix: add a Trailing flag to Suggestion. renderBlock now partitions on the flag and reserves one of its maxRendered slots for the lowest- priority trailing entry. Primary suggestions fill the remaining slots in ascending Priority order, as before. ResolveAfterInit marks its `azd deploy` footer Trailing:true; other resolvers are unchanged (none of them currently emit a structural footer). Net effect for end users finishing `azd ai agent init` with N missing manual variables: N=1 -> `azd env set X` + `azd deploy` (unchanged) N=2 -> `azd env set A` + `azd deploy` (was: A + B, deploy lost) N=3 -> `azd env set A` + `azd deploy` (was: A + B, deploy lost) The user is named one missing variable plus the deploy nudge. The previous behavior was equally lossy -- it just dropped the wrong thing. Naming every missing var would need a higher maxRendered, which trades the design's two-line UX cap for completeness; the design spec chose the cap, so the fix preserves it. Coverage: - TestPrintNext gains "trailing suggestion survives truncation", "trailing-only block renders as the single line", and "multiple Trailing entries collapse to the lowest-priority one". - TestResolveAfterInit (table) + TestResolveAfterInit_ManualVarsCapAtThree now assert `out[len(out)-1].Trailing == true`. Reviewer provenance: Opus 4.7 xhigh (High, empirically reproduced), Sonnet 4.6 (Medium), GPT-5.5 (Medium) all independently surfaced the same truncation bug on the b391886..077c550 diff; 3/3 consensus, no cross-pollination needed. Opus's Option B (sticky tail) is the approach implemented here -- the alternatives (cap manual-var lines to 1; introduce a renderer-limit parameter) either lose more user info or pollute the API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…deferred footer Cross-pollinated 3-model review on 8fa72db flipped the Trailing-tiebreaker policy. Previous implementation used "first Trailing wins" (lowest Priority on ascending sort). That defeats the regression-prevention purpose of the sticky- tail fix: if a future resolver accidentally flags a Priority < 90 entry as Trailing, current code silently drops the intended `azd deploy` footer (Priority 90) the exact regression 2.2.1 was meant to prevent. Switch to "last Trailing wins" (highest Priority on ascending sort = most- deferred footer). Mistake-likelihood is asymmetric: copy-pasting Trailing onto a low-priority hint is plausible; inventing a higher-than-deploy Priority is not. Reviewers ratifying last-wins: Sonnet 4.6 (original finder, Medium), GPT-5.5 (swung after cross-pollination), Opus 4.7 xhigh (reversed his initial Q5 ratification after cross-pollination). 3/3 consensus. Changes: - format.go: remove `if trailing == nil` guard so every Trailing entry overwrites; the loop terminates with a pointer to the highest-Priority Trailing entry. - types.go: docstring now spells out "highest Priority wins on collision". - format_test.go: rename "multiple Trailing entries collapse" case and pin `tail-b` (Priority 90) as the survivor instead of `tail-a` (Priority 80). Preflight: gofmt clean, go vet clean, go build clean, nextstep tests pass (10.9s), cmd tests pass (15.5s), golangci-lint 0 issues, cspell 0 issues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace the hardcoded `azd ai agent invoke --local "Hello!"` follow-up hint with the new nextstep package. The resolver picks a protocol- appropriate sample payload (`{"message": "Hello!"}` for invocations protocol agents; `"Hello!"` literal for responses protocol agents) and, when a cached OpenAPI spec from a prior `invoke` is available, replaces the default with the exact request body the agent expects. A tip line pointing at the OpenAPI doc is appended when the cache is empty. Smoke-tested against the hello-world-python-invocations sample (Foundry bring-your-own template). Output: Next: azd ai agent invoke --local '{"message": "Hello!"}' -- send a sample request to the running agent curl http://localhost:<port>/invocations/docs/openapi.json -- tip: inspect the spec to learn the agent's exact payload Starting agent on http://localhost:18347 (Ctrl+C to stop) The `After startup, in another terminal, try:` preamble is dropped in favor of consistency with the `init` success path (`init.go:1607`): the `Next:` header + the `Starting agent on <url> (Ctrl+C to stop)` line directly below convey the temporal ordering. If user testing shows confusion, a follow-up commit can wire `After startup...` text back in via the resolver's Description column. Out of scope: the `<port>` placeholder in the curl tip is a known documentation-grade hole. Substituting the live port requires plumbing `flags.port` through `State.Port` and the resolver deferred to keep this commit small. Preflight: gofmt clean, vet clean, build clean, cmd tests pass (14.9s), golangci-lint 0 issues, cspell 0 issues. Smoke-tested end-to-end. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… C9) Adds the 7th local doctor check, surfacing the same MissingManualVars signal that the post-init `Next:` block renders — but at any point in the flow (post-deploy, mid-debug, before reporting a bug). Closes the "manual config values missing" branch from issue Azure#7975 lines 117-127 on the doctor side. ## Why The doctor's job is to short-circuit user frustration: "I get the same guidance from `azd ai agent doctor` regardless of where I am in the flow." Today the manual-vars signal only surfaces at the tail of `azd ai agent init`, so a user who hits the issue mid-cycle (e.g. they cloned a friend's project, ran `azd env select`, and went straight to `azd ai agent run`) sees no clear pointer to the missing config. ## What - `internal/cmd/doctor/checks_manual_env.go` — new check produces ID `local.manual-env-vars`, name "manual env vars set". On Pass, message is "no manual env vars are missing"; on Fail, lists every missing var in the message and stages a paste-ready `azd env set <FIRST> <value>` suggestion (sorted alphabetically — the renderer paints Suggestion as a single line, multi-line breaks indentation). When 2+ vars are missing the suggestion adds a "Repeat for each of the other variables listed above." clause; when exactly 1 is missing the bare command is the suggestion (no misleading "and others" wording). - `internal/cmd/doctor/checks_local.go` — adds the check as the 7th entry in `NewLocalChecks` (after `local.agent-yaml-valid`); introduces a lowercase `assembleState` field on the exported `Dependencies` struct as a test seam (production code leaves it nil; tests inject directly). - Skip cascade: skips when AzdClient is nil OR when `local.agent-yaml-valid` failed/skipped OR when `local.environment-selected` failed/skipped. The first guard transitively covers azure-yaml → agent-service-detected → agent-yaml-valid; the second is required because env-selection is a sibling chain (not upstream of agent-yaml-valid) and `nextstep.AssembleState` silently early-exits its detectMissingVars block when no env is selected (state.go: `if project != nil && envName != ""`). Without the env guard the check would falsely Pass in a state where no env values were ever examined. ## Architecture: why reuse `nextstep.AssembleState` The "manual vs infra" classification logic lives in nextstep — the same place that drives every other `Next:` recommendation. Adding a separate classifier inside the doctor would split the source of truth, and future improvements (e.g. C1's Bicep-output discovery replacing the `AZURE_` prefix shortcut) would have to be ported twice. Forwarding to `AssembleState` keeps the doctor as a presentation layer. ## Tests 12 new sub-tests in `checks_manual_env_test.go`: - No client → Skip - Prior `local.agent-yaml-valid` failed → Skip - Prior `local.agent-yaml-valid` skipped → Skip (cascade propagation) - Prior `local.environment-selected` failed → Skip (regression for Opus xhigh review HIGH finding; asserts assembler is NOT called via t.Fatalf in the fake) - Prior `local.environment-selected` skipped → Skip (symmetric) - 0 missing → Pass with "no manual env vars are missing" - 1 missing → Fail with bare `azd env set <NAME> <value>` suggestion (asserts no "Repeat" / "likewise" wording — regression for Sonnet 4.6 review MEDIUM finding) - 4 missing → Fail with sorted list; suggestion has "Repeat for each of the other variables" clause - nil State from assembler → Fail with assembly error message - nil State + nil errs → Fail with fallback "unknown error" message - Non-fatal errs but State populated → Pass (state.MissingManualVars is the authoritative signal; ancillary errors like missing AI_AGENT_PENDING_PROVISION key don't dirty the result) - Existing `TestNewLocalChecks_OrderAndIDs` extended from 6 to 7 checks; new `TestNewLocalChecks_IncludesManualEnvVarsLast` pins the agent-yaml-valid → manual-env-vars ordering invariant so a future reorder can't silently break the skip-cascade. ## Out of scope - Bicep-output classifier (B1 fix / C1) — `assembleState` today routes non-`AZURE_*` Bicep outputs (e.g. `TOOLBOX_*`) into MissingManualVars rather than MissingInfraVars. This check will surface that bug verbatim until C1 lands; that's the correct phasing because (a) every Phase 5 consumer benefits from the same fix at once, and (b) showing the current behavior in the doctor makes the bug debuggable in the meantime. Refs: Azure#7975 lines 117-127, 308-312 (issue spec) Refs: Azure#8057 (PR being implemented) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Wire the Cobra/runner pipe for remote (network-dependent) doctor checks so commits C11-C17 can add individual checks (auth, foundry endpoint reachability, RBAC, agent status) without touching the doctor command's Cobra surface or runner internals. Most of C10's framework was already in place from Phase 4: - `Check.Remote bool` (runner.go:32-37) - `Runner.Run` skips remote checks under `Options.LocalOnly` (runner.go:74-82) - `report.Remote = true` flipped whenever an executed check is Remote (runner.go:128-130) - `--local-only` / `--unredacted` flags bound on the Cobra surface and threaded through `doctor.Options` What was missing was the *factory slot* — a `NewRemoteChecks` mirror of `NewLocalChecks` that the doctor command appends to its check list. Without that slot the only place to add a remote check was the command file itself, breaking the convention established by `NewLocalChecks` (every doctor check lives in the `doctor` package; the command file is plumbing only). Changes ------- * New `internal/cmd/doctor/checks_remote.go`: - `NewRemoteChecks(deps Dependencies) []Check` — empty today; documents the conventions C11+ remote checks must follow (Remote: true, ctx cancellation, skip-cascade against the local chain via `priorBlocked`, redaction discipline under `!Options.Unredacted`). - Names the four follow-up checks the slot is reserved for so a future reader can immediately see the scope without cross-referencing the plan: C11 auth, C12 foundry endpoint, C16 RBAC, C17 agent status. - Explicitly documents that local-checks-then-remote-checks ordering is load-bearing for `priorBlocked` skip-cascade. * `internal/cmd/doctor.go`: - `runDoctor` now builds the runner from `append(NewLocalChecks(deps), NewRemoteChecks(deps)...)` instead of `NewLocalChecks(deps)` alone. Comment on the call site pins the ordering contract. - `doctorFlags` doc comment rewritten to describe today's reality (the wire is fully exercised; remote-checks factory is empty but populated transparently when C11+ land) instead of the pre-C10 wording "no-op today, reserved for an upcoming pass." - `--local-only` and `--unredacted` user-visible help text trimmed of internal plan-tracking jargon (no more "subsequent commits" or "(P5 C11+)"). The first sentence now stands on its own and reads cleanly under `--help`. * `internal/cmd/doctor/types.go`: - `Options.LocalOnly` doc comment updated from "no-op in phase 4 — no remote checks are wired yet" to match the new post-C10 reality (the factory returns empty today; the wire is exercised). * New `internal/cmd/doctor/checks_remote_test.go` (5 tests): - `TestNewRemoteChecks_EmptyTodayButCallable` — pins the contract; when C11 lands the empty-slice assertion fires and forces the author to update the count. - `TestNewLocalAndRemoteChecks_ProductionCompositionLocalsFirst` — pins the load-bearing local-then-remote ordering by reading the actual production factories (not synthesized checks). Asserts (a) every check in NewLocalChecks has Remote=false, (b) every check in NewRemoteChecks has Remote=true, (c) no local check appears after any remote check in the combined slice. Catches a future contributor swapping the append order or forgetting the Remote flag on a remote check. - `TestRunner_LocalThenRemote_RemoteSeesLocalPriorResults` — asserts the runner preserves the slice order so a remote check's `priorBlocked` guard reads the local results. - `TestRunner_LocalOnly_AppendedRemoteCheck_NotInvoked` — exercises the production-shaped `append(local, remote...)` slice under `LocalOnly: true`. - `TestRunner_RemoteCheck_RanProducesReportRemoteFlag` — asserts `report.Remote = true` against the production-shaped slice when a remote check executes. User-visible behavior --------------------- None. The remote-checks factory is empty, so: - `azd ai agent doctor` produces the same 7-local-check report it did before this commit. - `azd ai agent doctor --local-only` produces the same 7-check report (no remote checks to skip). - `azd ai agent doctor --output json` envelope has `"remote": false` (no remote check executed). The change is exclusively in the *plumbing* for the follow-up commits. Preflight --------- * gofmt -s -w . — clean * go vet ./... — clean * go build ./... — clean * go test ./... -count=1 — green (cmd 15.0s, doctor 6.1s, nextstep 3.7s, etc.) * golangci-lint run ./internal/cmd/... — 0 issues * npx cspell lint "internal/cmd/doctor.go" "internal/cmd/doctor/**/*.go" --relative --config ../../.vscode/cspell.yaml --no-progress — 0 issues across 7 files Closes Phase 5 commit slot C10. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Implements Check 7 from the doctor remote-checks design as the first populated entry in the remote chain wired up by C10. The check proves that `azd auth token` can mint a token for the Foundry data-plane scope and surfaces the result with branched severity: - Pass: "<UPN> · token valid for <N> minutes" - Warn (< 5 min remaining): suggest `azd auth login` proactively - Fail (expired or acquisition error): suggest `azd auth login` with a learn.microsoft.com link to the `azd auth login` reference - Skip (user cancelled): no suggestion — Ctrl-C is not an auth bug - Fail (probe deadline): distinct "timed out" message that does NOT immediately accuse the user of being logged out Skip-cascade: `local.environment-selected`. Per the design dependency matrix, an env must be selected before remote checks have a project to test against; otherwise the remote chain would run against the wrong context. `local.grpc-extension` is intentionally NOT a precondition — the auth probe goes straight to azidentity and does not need the AzdClient gRPC channel. Implementation notes: - Uses `azidentity.NewAzureDeveloperCLICredential` (same as `agent_context.go:newAgentCredential`) so the probe mirrors the production credential exactly. - Requests the production data-plane scope `https://ai.azure.com/ .default` (same as `agent_api/operations.go`) so a Pass here matches what the runtime invoke flow needs — not a different scope that might succeed when the runtime would fail. - Wraps the probe in a 10s timeout to bound stuck shells. After the probe returns the check classifies `context.Canceled` / `context.DeadlineExceeded` separately from generic auth errors so user Ctrl-C maps to Skip (not Fail) and a probe timeout maps to a distinct Fail message instead of telling the user to log in again when the real cause is a stuck `azd` invocation. - UPN extraction is best-effort JWT payload decoding (stdlib only, no third-party JWT lib). Tries `upn`, `unique_name`, `preferred_username`, `email` in order. Decode failures return an empty UPN and never an error: the auth check cares about the token's validity, not how readable its claims are. - The raw access token is never logged, returned, or exposed outside `realProbeAuth`. - Wrong-tenant detection is intentionally NOT done here — that's the job of `remote.foundry-endpoint` (C12) where a 403 maps to a precise "wrong tenant or insufficient RBAC" suggestion. Surfacing the same failure mode here would produce false positives — flagging auth as broken when the user IS authenticated, just against the wrong tenant. - `formatTokenWindow` substitutes "less than 1 minute" for any sub-minute positive window so the Warn message can never read "token expires in 0 minutes" — that wording is indistinguishable from expiry to a reader scanning the report. - `firstLine` strips a trailing `\r` after splitting on `\n` so Windows CRLF stderr output from `os/exec`-invoked `azd` does not leak a carriage return into the doctor report message. - Every Fail branch that suggests `azd auth login` now carries the same `authLoginLink` (a single package constant), so all four suggestion paths point at the canonical MS Learn reference. Testability: - New `probeAuth` test seam on `Dependencies` matches the pattern used by `assembleState`. Production wiring leaves it nil; the check falls back to `realProbeAuth`. Tests inject deterministic `authProbeResult` values to exercise each branch. - 22 new tests cover: skip-cascade (Fail + Skip cases); default seam fallback; all severity branches (Pass, Warn, sub-minute Warn, expired Fail with Links, acquisition-error Fail with Links, cancellation Skip, deadline Fail); singular / plural and sub-minute formatting; 5-minute Warn/Pass boundary; UPN claim ordering / empty / whitespace / non-string variants; and JWT decode failure modes (empty token, wrong segment count, invalid base64, non-JSON payload, non-object JSON root). Files: - checks_auth.go (new): newCheckAuth + realProbeAuth + extractUPN + format helpers + authLoginLink constant - checks_auth_test.go (new): comprehensive table-driven coverage - checks_local.go: add `probeAuth` test seam to Dependencies - checks_remote.go: append newCheckAuth(deps) to the remote chain - checks_remote_test.go: replace the empty-slice contract test with one pinning the auth-only shape, so any future addition has to touch this one assertion Reviewer findings applied (3-reviewer pass: Opus xhigh + Sonnet 4.6 + GPT-5.5): - HIGH (Sonnet): Expired-token Fail was missing `Links` — fixed, every Fail with `azd auth login` suggestion now carries the reference link via the new `authLoginLink` constant. - MEDIUM (GPT-5.5): Cancellation / timeout was classified as generic auth failure — fixed, classified separately with appropriate Skip / Fail and distinct messages. - MEDIUM (Sonnet + Opus convergent): Sub-minute positive validity rendered as "token expires in 0 minutes" — fixed via `formatTokenWindow` substituting "less than 1 minute". - MEDIUM (Sonnet): `firstLine` left trailing `\r` on Windows CRLF — fixed with `strings.TrimRight(s[:i], "\r")` + CRLF test case. - LOW (Sonnet): Doc said "false negatives" where logic required "false positives" — corrected. - LOW (Opus): 5-minute Warn/Pass boundary was not directly tested — added `TestCheckAuth_WarnPassBoundaryAtFiveMinutes`. - LOW (Opus): Doc comment cited a non-existent design path — updated to the actual `.tmp/pr-8057/...` location. Preflight: gofmt -s -w . ✓, go vet ./... ✓, go test ./... -count=1 (all packages) ✓, golangci-lint run ./internal/cmd/... ✓, cspell ✓. Refs: Azure#7975 Refs: design `.tmp/pr-8057/azd-ai-agent-doctor-remote-checks.md` Check 7 / dependency matrix lines 112-131 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… C12) Implements Check 8 from the doctor remote-checks design as the second populated entry in the remote chain. The check proves that the configured Foundry project endpoint is reachable AND that the bearer token minted by C11 actually authorizes against it — a combined "can we talk to the project at all" signal that gates every other remote check (models / agent status / RBAC) from the design's dependency matrix. The probe issues a single `GET <endpoint>/agents?api-version=<v>&limit=1` with a 10s timeout, no retries, using the same credential + scope as the production runtime path (`agent_context.go:newAgentCredential` + `agent_api/operations.go`). The `limit=1` parameter matches the production agent_api client exactly so a Pass here proves the same query shape the runtime invoke flow uses (the earlier `$top=1` choice was a divergence flagged by reviewers). The status-code response is mapped 1:1 to user-actionable outcomes: - 200 → Pass: "endpoint reachable (HTTP 200)" - 401 → Fail: "token expired or scope mismatch" + suggest `azd auth login` (link: auth login docs) - 403 → Fail: "wrong tenant or insufficient RBAC" + suggest re-auth in correct tenant and let `remote.rbac` flag the role-assignment gap (deliberately NO `azd auth login` here, but DO carry a docs Link to the Foundry RBAC quickstart so the suggestion is actionable — matches the C11 "every actionable Fail carries Links" convention) - 404 → Fail: "endpoint is wrong or project is gone" + suggest `azd provision` / `azd env set AZURE_AI_PROJECT_ENDPOINT` - 5xx → Fail: "service-side error" + suggest retry - other → Fail: "unexpected HTTP <N>" + verbose hint - transport err → Fail: "could not reach <host>: <first line>" + network / VPN / firewall guidance - ctx canceled → Skip (user aborted) - 10s elapsed → Fail: "did not respond within 10s" + retry hint Skip-cascade: `local.project-endpoint-set` AND `remote.auth`. The former gives us the endpoint to probe; the latter gives us the token to authenticate with. Skipping when either failed prevents double-reporting the same root cause. `local.environment-selected` is transitively cascaded via `local.project-endpoint-set`. Implementation notes: - The api-version is read from the package-level `DefaultAgentAPIVersion` constant by the Cobra wiring in `cmd/doctor.go` and passed in through `Dependencies.AgentAPIVersion`. This honors the design's "single source of truth" requirement while keeping the doctor package self-contained (no import cycle against `internal/cmd`). - `makeRealProbeFoundryEndpoint(apiVersion string) func(...)` is a closure factory rather than a top-level function so the api-version is captured at construction time without becoming a global. - `buildFoundryProbeURL` parses the user-supplied endpoint FIRST, then mutates `u.Path` (trim-right + "/agents"), clears `u.Fragment` / `u.RawFragment`, and only then sets RawQuery. This prevents a stray `?api-version=evil` or `#fragment` in the endpoint from displacing the `/agents` path segment — a silent-misdiagnosis bug the prior `endpoint + "/agents"` string concatenation could trigger. Regression tests now positively assert `/agents?` is present in every successful build output. The builder also returns an error for any endpoint that is not an absolute HTTPS URL with a non-empty host, so a malformed env value cannot leak a bearer token to the wrong scheme/host. - A new `validateFoundryEndpoint` helper runs at the check-level BEFORE the probe is invoked, BEFORE any token is acquired. A non-HTTPS, relative, or otherwise-malformed `AZURE_AI_PROJECT_ENDPOINT` surfaces a precise Fail with an `azd env set AZURE_AI_PROJECT_ENDPOINT <https://...>` suggestion instead of either a generic transport error (with the token leak that would have come with it) or the builder's defensive error wrapped in a less-helpful network-VPN-firewall message. - Cancellation classification mirrors C11's pattern: `errors.Is(ctx.Err(), context.Canceled)` → StatusSkip (user aborted); `errors.Is(probeCtx.Err(), context.DeadlineExceeded)` → StatusFail (we hit our own 10s bound, not the parent ctx). - Multi-line transport errors are reduced to their first line via the shared `firstLine` helper from C11 so the resulting Message stays readable. - The `Details` map carries the endpoint, request URL, and HTTP status code (when available) for `--output json` consumers and `--unredacted` debugging. No raw tokens, no response body excerpts. - 24 tests cover skip-cascade (env-not-selected, endpoint-not-set, auth-failed, AgentAPIVersion-missing), every status-code branch, cancellation vs timeout disambiguation, URL builder safety (junk query in endpoint, trailing slashes, fragment, blank api-version, non-HTTPS / relative / malformed endpoint), endpoint validation (HTTPS-only, non-empty host, well-formed), helper functions (`endpointHost`, `readProjectEndpoint`, `firstLine` reuse), a TLS httptest server smoke test asserting the built URL lands on `/agents` on the wire, and a token-leak sanity check on Details / Message / Suggestion strings. Behavior: with this commit, `azd ai agent doctor` now produces two remote checks (`remote.auth` from C11 + `remote.foundry-endpoint` from C12) instead of just one. The full remote chain still requires C13+ to be useful end-to-end, but every subsequent check can now take a Pass on this one as proof that the project URL works. Refs: Azure#7975, PR Azure#8057 design-spec Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds the third remote doctor check, `remote.rbac`, implementing Check Azure#10 of issue Azure#7975 / Check 10 of the design doc (`azd-ai-agent-doctor-remote-checks.md` lines 159-180): "Developer has required role on Foundry project". The check queries the developer's role assignments on the Foundry project's ARM scope via the Microsoft Graph + ARM stack, then classifies the result: * Pass: "<display> has the required role on project '<acct>/<proj>'" (display name replaced with "the current principal" in the default redacted mode for UPN safety). * Fail: templated `az role assignment create --role "Azure AI User" --assignee <oid> --scope <scope>` + a learn.microsoft.com link to the RBAC concepts page. Principal ID and scope ARN are substituted with shell-safe ALL_CAPS placeholders (OBJECT_ID / PROJECT_SCOPE) in redacted mode — NOT `<redacted>`, because bash/zsh interpret `<word>` as input redirection. * Skip: precondition unmet (no AzdClient, no env, auth Failed, `AZURE_AI_PROJECT_ID` missing/malformed/unset, service-principal token detected, transient Graph/ARM error, cancellation). Architecture ------------ Two new files: internal/project/developer_rbac_query.go (~175 LoC) - QueryDeveloperRBAC: side-effect-free wrapper returning a structured DeveloperRBACResult. Reuses package-private parseAgentIdentityInfo / hasAnyRoleAssignment / sufficientAIUserRoles from developer_rbac_check.go. - ValidateProjectResourceID: shape check for AZURE_AI_PROJECT_ID, returns wrapped ErrInvalidProjectResourceID sentinel. - ErrInvalidProjectResourceID, ErrSPNDelegatedAuthRequired: sentinel errors for diagnostic classification. - Graph /me failure detection: routes app-only/SPN token rejections onto ErrSPNDelegatedAuthRequired (case-insensitive message match) so doctor surfaces a SPN-aware Skip. internal/cmd/doctor/checks_rbac.go (~430 LoC) - newCheckRBAC: skip-cascade + projectID lookup + upfront ValidateProjectResourceID gate + probe dispatch. - classifyRBACProbeError: sentinel-keyed error classification (Canceled / SPN / InvalidProjectID / generic transient). - classifyRBACResult: pure Pass/Fail mapping for diagnostic consumers. - sanitizeScopeARNs: regex-based scope+GUID scrubber for probe error text. - readProjectResourceID: AZURE_AI_PROJECT_ID env lookup via gRPC EnvironmentService. - redactID / redactScope / redactDisplay: centralized placeholder substitution helpers. Two new seams on Dependencies: probeDeveloperRBAC - replaces project.QueryDeveloperRBAC readProjectResourceIDFn - replaces readProjectResourceID Skip-cascade rationale ---------------------- Per design dependency matrix (line 115), `remote.rbac` cascades against `local.environment-selected` + `remote.auth` but NOT `remote.foundry-endpoint`. RBAC reads ARM, not the data plane; a transient DNS / proxy / outage on the data-plane check must not prevent the user from learning their role assignment is missing. `TestCheckRBAC_DoesNotSkipOnFoundryEndpointFail` pins this. Probe errors → Skip (not Fail) to avoid false alarms on transient Graph/ARM hiccups. Cancellation similarly Skips with a clean message rather than rendering an error trace. Review fixes applied -------------------- Following the 3-reviewer pass (Opus xhigh + Sonnet 4.6 + GPT-5.5) of an earlier draft (commit 0c4d5ee), the following findings were addressed: * MEDIUM (Opus + GPT-5.5): probe-error path leaked raw subscription/scope IDs via azcore.ResponseError.Error()'s first line (`GET https://management.azure.com/subscriptions/...`) into Message + Details. Fix: sanitizeScopeARNs regex pass strips ARM scopes + bare GUIDs from Message in redacted mode; Details["probeError"] is OMITTED entirely unless --unredacted. * MEDIUM (Sonnet 4.6): malformed AZURE_AI_PROJECT_ID got "check your network" Suggestion despite no network call. Fix: upfront ValidateProjectResourceID gate runs before the probe; surfaces "is not a valid Foundry project ARM resource ID" with an `azd env set` Suggestion. * MEDIUM (GPT-5.5): PrincipalDisplay rendered verbatim in Message even in default redacted mode; display names can contain UPN fragments (e.g., "Alice (alice@contoso.com)"). Fix: redactedDisplayLabel ("the current principal") substitutes for raw display in default mode; unredacted mode still echoes the real display. * MEDIUM (GPT-5.5): service-principal `azd auth login` cannot use Graph /me — confusing generic transient Skip. Fix: ErrSPNDelegatedAuthRequired sentinel + case-insensitive detection of the canonical "delegated authentication flow" Graph response; doctor surfaces a SPN-aware Skip with user-delegated guidance. * LOW/Nit (Opus): `<redacted>` is a bash input-redirection token; `--assignee <redacted>` would fail with `redacted: No such file or directory` on copy-paste. Fix: shellSafePlaceholderID/Scope constants render `OBJECT_ID` / `PROJECT_SCOPE` in the templated az command. Verified clean (no action): skip-cascade structure, firstLine helper, AZURE_AI_PROJECT_ID provenance (set by init_foundry_resources_helpers.go:327), seam design, account-scope check missing (acceptable per `assignedTo()` inheriting parent- scope assignments). Testing ------- 22 unit tests in checks_rbac_test.go and developer_rbac_query covering: skip cascades, probe-error branches (transient, parse, SPN, defensive sentinel, cancellation), end-to-end probe Pass/ Fail, classifyRBACResult mapping with both redaction modes, display-name fallback, sensitive-identifier leak prevention, shell-safe placeholder substitution, ValidateProjectResourceID shape coverage, sanitizeScopeARNs regex coverage, all three redaction helpers (redactID/Scope/Display) with empty-input + flag permutations. Preflight (from cli/azd/extensions/azure.ai.agents) --------------------------------------------------- gofmt -s -w . - clean go vet ./... - clean go build ./... - clean go test ./... -count=1 - all packages green golangci-lint run ./internal/cmd/... - 0 issues ./internal/project/... npx cspell lint - 0 issues "internal/cmd/doctor/**/*.go" "internal/project/developer_rbac_query.go" Refs: Azure#7975 (PR Azure#8057, Phase 5 / C16) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds the fourth (and final per the current phase-5 scope) remote doctor check: `remote.agent-status`. For each hosted-agent service declared in azure.yaml, the check resolves the deployed agent name + version from the active azd environment (the AGENT_<KEY>_NAME and AGENT_<KEY>_VERSION values written by service_target_agent.go on a successful `azd deploy`) and probes Foundry's `GET /agents/{name}/versions/{version}` endpoint, mapping the lifecycle status to a doctor classification. Per-service classification - Active → Pass (one of N agents active) - Creating → Warn (azd ai agent monitor --follow) - Failed → Fail (azd ai agent monitor --follow, NOT azd deploy — read logs first) - 404 / Deleting / Deleted → Fail (azd deploy) - AGENT_<KEY>_NAME unset, OR NAME set but AGENT_<KEY>_VERSION unset → Fail (azd deploy — the service is declared but has not been deployed cleanly; the post-deploy hook writes both env vars atomically so a half-pair is a deterministic config issue) - Unrecognized status → Fail with raw status surfaced so the user can search; suggests the Foundry portal - Probe error / 401-403 → service-scoped transient skip; does NOT fail the aggregate when other services are healthy Aggregate The per-service entries fold into a single doctor Result via a ranked classifier (`agentClassRank`). Worst-class drives the aggregate Status: - All Active → Pass - Worst is Failed → Fail (monitor) - Worst is Missing (404 / Deleted / Deleting) → Fail (deploy) - Worst is NotDeployed (no env var / half-pair) → Fail (deploy) - Worst is Unknown → Fail (portal) - Worst is Deploying / Creating → Warn (monitor) - Worst is Transient AND ≥1 Active → Pass with Message note - All Transient → Skip (retry) When multiple failing classes coexist (e.g., one Failed agent and one Missing agent in the same project) the dominant class drives the headline and Suggestion. Detail lines are filtered to the dominant class only so the headline count and the rendered body always agree; a short "other agents have additional issues" hint is appended to the Suggestion telling the user to read Details for the secondary fix path. The Message lists up to 3 failing services (with `(N more)` after that). Active services are listed in the all-active Pass message. Details["services"] always carries the full per-service list for JSON consumers. Fan-out Probes execute concurrently across services with a bounded worker pool (probeConcurrency = 4) so wall-clock cost is bounded by the slowest probe rather than the sum of probes. Each probe still enforces its own 6 s timeout via agentStatusProbeTimeout; the parent ctx propagates cancellation to all in-flight workers. Skip-cascade (per design dependency matrix lines 110-117 of .tmp/pr-8057/azd-ai-agent-doctor-remote-checks.md): - local.environment-selected (env reads would fail) - local.agent-service-detected (no service list ⇒ Pass = bug) - remote.auth (no token ⇒ every probe 401s) - remote.foundry-endpoint (no reachability ⇒ N transports) We deliberately do NOT cascade from `remote.rbac`: agent-list / agent-get is Reader-level, and developers with read-only access on the Foundry project still benefit from knowing whether their agents are healthy. Pinned by TestCheckAgentStatus_DoesNotSkipOnRBACFail. Review findings applied (Opus 4.7 xhigh + Sonnet 4.6 + GPT-5.5) - MEDIUM (Sonnet Azure#1 + Opus Azure#1): doc/code/test disagreement on Active + Transient mix. The godoc promised "Pass with note" but the code returned Skip and the test pinned the wrong behavior. Implemented the documented Pass-with-note so a transient probe failure for one service does not mask the healthy state of the rest. Test renamed to TestCheckAgentStatus_Aggregate_ActiveAndTransient_PassWithNote and asserts the Message text. - MEDIUM (Sonnet Azure#2 + Opus Azure#2 + GPT-5.5 Azure#3): headline count and rendered detail body diverged when entries from multiple failing classes coexisted (e.g., "1 of 2 agents are in a failed state" with 2 detail lines including a Missing entry). The aggregate now filters detail lines to the dominant class via `detailsForClass(worst)`, and the Suggestion is enriched with an "Other agents have additional issues (<classes>); see the per-service Details" hint when other failing classes are present. Pinned by TestCheckAgentStatus_Aggregate_FailedDominatesMissing. - MEDIUM (GPT-5.5 Azure#1): AGENT_<KEY>_NAME present with AGENT_<KEY>_VERSION empty was classified as `transient`, surfacing "retry doctor" — wrong for a deterministic config issue. Reclassified as `not-deployed` so the user is told to `azd deploy`. Test TestCheckAgentStatus_MismatchedNameVersion_NotDeployedForService + TestProbeOneService_NameSetVersionEmpty_NotDeployed. - MEDIUM (GPT-5.5 Azure#2): serial probes — 100 services × 6 s could drag the doctor run past 10 minutes. The design spec calls it "fan out", so probes now run in parallel via a bounded 4-worker pool (probeConcurrency). Order preservation guarantees deterministic Details rendering. - LOW (Sonnet Azure#3): added TestProbeOneService_DeletingStatus_Missing covering the `Deleting` lifecycle branch that previously had no direct test (only `Deleted` was covered). Files - internal/cmd/doctor/checks_agent_status.go * newCheckAgentStatus + the Check shape * probeOneService — per-service classification body (now-corrected name-set-version-empty path) * probeAllServices — bounded-concurrency fan-out helper * classifyAgentStatusAggregate — folds entries into one Result with class-filtered detail lines + mixed-class Suggestion hint + Active+Transient Pass-with-note branch * makeRealProbeAgentStatus — production probe closure (uses agent_api.GetAgentVersion + azidentity.NewAzureDeveloperCLI Credential, the same auth path the runtime invoke flow uses) * readAgentNameVersion + readAgentServices helpers * doctorServiceKey (mirrors cmd.toServiceKey; duplicated to avoid an import cycle, same rationale as agentHost in checks_project.go) * Lifecycle constants (Active / Creating / Failed / Deleted / Deleting) sourced from vienna:Contracts/V2/Generated/Agents/AgentVersionStatus.cs * truncateLines / serviceNamesByClass / firstTransient - internal/cmd/doctor/checks_agent_status_test.go (~640 LoC, 34 tests; +2 new tests over v1) * Skip-cascade gates × 9 * Per-service classification × 7 (incl. Deleting, NameNoVersion) * Status case-insensitive matching * Aggregate ranking × 5 (incl. Active+Transient Pass-with-note, FailedDominatesMissing with Message-text assertion) * Aggregate truncation at 3 + "(N more)" * probeOneService transport branches × 6 (incl. context.Canceled and context.DeadlineExceeded handling) * Service-key edge cases, rank fallback, truncateLines boundary, makeRealProbeAgentStatus closure check, plus a ServerHandler-based smoke test that the azcore.ResponseError code is surfaced via statusCode - internal/cmd/doctor/checks_local.go * Dependencies struct grows two new seams: probeAgentStatus + readAgentNameVersionFn (mirrors the probeDeveloperRBAC + readProjectResourceIDFn pattern from C16) - internal/cmd/doctor/checks_remote.go * NewRemoteChecks adds newCheckAgentStatus(deps) as the 4th entry, after auth / foundry-endpoint / rbac - internal/cmd/doctor/checks_remote_test.go * TestNewRemoteChecks_HasAuthFoundryEndpointRBACAndAgentStatus pins the 4-entry shape Out of scope (deferred) - Sharing the credential across services: each probe currently constructs its own azidentity.NewAzureDeveloperCLICredential. Since the credential is essentially a thin shell around `azd auth token`, the cost is negligible (a single in-process call per probe) and threading it through complicates the test-seam shape. Will revisit if benchmarks surface it. - Re-using readAgentNameVersion for the doctor's eventual ENV pretty-print mode. Out of scope for the check itself; the helper is unexported and can be promoted when the renderer needs it. Preflight (from cli/azd/extensions/azure.ai.agents) - gofmt -s -w . clean - go vet ./... clean - go build ./... clean - go test ./... -count=1 32 doctor tests + full ext suite PASS - golangci-lint run ./... 0 issues - cspell lint ... (cspell.yaml) 0 issues Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fixes two pre-existing UX bugs in `azd ai agent run`: B5 (Next: race): pre-C8 the Next: block was rendered BEFORE `proc.Start()`, so its "in another terminal, try: <example>" line could land in the user's terminal seconds before the agent's port was bound. Following the suggestion verbatim while the agent was still starting yielded a connection-refused. B6 (stale cached OpenAPI): pre-C8 the OpenAPI probe was strictly cache-only. The first `run` ever issued — and every `run` against an agent whose schema had drifted — surfaced a protocol-generic `<payload>` literal instead of the agent's actual example. Scope — only `run.go` uses the live probe. `show.go` (`WithOpenAPIProbe (name, "remote")`), `deploy`/artifact-note path, `init.go`, and `init_from_code.go` remain cache-only by design. Changes ------- nextstep/state.go - New functional option `WithLiveOpenAPIProbe(fetch func(context. Context) ([]byte, error))`. Stores the fetcher in `cfg.openAPILiveFetch`. - `populateOpenAPIPayload` now takes `(ctx, cfg, projectPath, envName, state)`. Order of resolution: (1) live, on success → use it; (2) cache (existing `WithOpenAPIProbe` path), on success → use it; (3) leave HasOpenAPI=false. Every failure is silent. - `assembleState` threads `ctx` through to the new signature. - Doc comment on `WithOpenAPIProbe` updated to note that `WithLiveOpenAPIProbe` overrides it when both are supplied. run.go - Removed the pre-Start `nextstep.AssembleState` + `PrintNext` block that was emitting Next: before the agent bound its port. - After `proc.Start()`, spawn `emitNextAfterBind` in a goroutine with a `nextDone` channel. After `proc.Wait()` returns, the main flow calls `cancel()` and waits on `<-nextDone` so the goroutine is fully joined before stdout writes from `runRun`'s caller resume — closes a stdout race on shutdown. - `emitNextAfterBind` early-returns when stdout is not a terminal, honoring the documented nextstep call-site TTY-gating contract (matches `invoke.go:217`, `show.go:381`). Redirected stdout (`run > log`) no longer receives the banner or Next: block. - After waitForPortReady succeeds, builds a closure that wraps `fetchLiveOpenAPI(ctx, port)` and passes BOTH `WithOpenAPIProbe(serviceName, "local")` (cache fallback) and `WithLiveOpenAPIProbe(...)` to `AssembleState`. The state assembler picks live first, falls back to cache silently if the live fetch fails. - Re-checks `ctx.Err()` after `AssembleState` returns so a Ctrl+C arriving mid-call doesn't surface "Agent ready" after "Agent stopped." was already printed. - Four new constants: `portReadyBudget` (5 s), `portReadyPollInterval` (100 ms), `portReadyDialTimeout` (50 ms), `liveOpenAPITimeout` (3 s). - `waitForPortReady(ctx, port, budget) bool`: bounded TCP dial-loop that honors ctx. - `fetchLiveOpenAPI(ctx, port) ([]byte, error)`: uses `http.NewRequestWithContext` to GET `http://localhost:<port>/invocations/docs/openapi.json`. The route matches the cache-side fetcher in `helpers.go:368` and the user-facing curl tip in `nextstep/resolver.go:226`. Non-200 responses are returned as errors so the assembler falls back to cache rather than ingesting a 404 body via `ExtractInvokeExample`. Tests ----- state_test.go (+5 new TestAssembleState_WithLiveOpenAPIProbe_* cases + expanded `TestOptionsApplyCleanly`): - PrefersLiveOverCache, FallsBackToCacheOnError, FallsBackToCacheOnEmptyBody, LiveWorksEvenWithoutCacheProbe, LiveFailureWithoutCacheLeavesUnset. run_test.go (+8 new tests + `listenLoopback` helper): - 3× waitForPortReady (bound port, budget elapse, ctx cancel). - 3× fetchLiveOpenAPI (200 body asserts `/invocations/docs/openapi.json` path, non-200 error, ctx deadline). - 2× emitNextAfterBind (never-binds, ctx cancelled — both pass nil azdClient through the safe early-return paths to verify the helper exits silently without panic or goroutine leak). Preflight clean: gofmt -s -w, go vet, go build, go test ./internal/cmd/... ./internal/cmd/nextstep/... -count=1 (cmd 10.5 s, doctor 1.6 s, nextstep 1.9 s), golangci-lint run ./internal/cmd/... (0 issues), cspell on the four modified files (0). Review fixes (3-reviewer pass) ------------------------------ Three independent reviewers (Opus xhigh, Sonnet 4.6, GPT-5.5) reached consensus on three correctness findings before merge: - Live probe URL corrected to /invocations/docs/openapi.json (matches existing cache fetcher and user-facing curl tip). - Banner + PrintNext now gated on isTerminal(os.Stdout.Fd()) to honor the nextstep call-site contract. - emitNextAfterBind goroutine is now joined via nextDone channel after proc.Wait, and re-checks ctx.Err() before printing so the banner cannot land after "Agent stopped." - Replaced misleading "ReturnsSilentlyWhenPortNeverBinds" test that only exercised waitForPortReady with two tests that actually call emitNextAfterBind with nil azdClient on the safe early-return paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…P5.1 C12) Adds the final doctor check (P5.1 C12) that surfaces deployed-agent managed-identity role assignments at three ARM scopes — project, account, and resource group. For each agent classified active by the upstream `remote.agent-status` check, the new `remote.agent-identity-roles` check fetches the agent's `instance_identity.principal_id` from Foundry and lists ARM role assignments at all three scopes via the existing `armauthorization` SDK already pulled in for the developer-RBAC check. ## What lands `internal/project/agent_identity_query.go` (new, ~340 LoC): - Public API `QueryAgentIdentityRoles(ctx, azdClient, projectResourceID, principals) (*AgentIdentityRolesResult, error)` reuses `parseAgentIdentityInfo` from `agent_identity_rbac.go` to derive the three scope ARNs, looks up the user-access tenant via `LookupTenant`, builds an `AzureDeveloperCLICredential` pinned to that tenant, and fans out per-principal role-assignment listings with `wg.Go`. - Public types `AgentPrincipal`, `AgentScopeRoles`, `AgentIdentityRolesEntry`, `AgentIdentityRolesScopes`, `AgentIdentityRolesResult` form the structured listing the doctor renders. - `queryAgentIdentityRolesWithLister` separates the per-scope listing strategy from credential acquisition so unit tests drive the inner classifier without standing up ARM fakes. - `listRoleNamesAtScope` lists role assignments with ARM's server-side `assignedTo('<principal>')` filter, then resolves role-definition IDs to human-readable names via `RoleDefinitions.Get`. Failures on individual role-name resolution downgrade gracefully (omitted from the listing). `internal/cmd/doctor/checks_agent_identity_roles.go` (new, ~640 LoC including doc comments): - `newCheckAgentIdentityRoles(deps)` builds the check Closure. Skip cascade against `local.environment-selected`, `local.agent-service-detected`, `remote.auth`, `remote.foundry-endpoint`, and `remote.agent-status`'s Pass (per the design's "for each active agent found in check 11"). - `readActiveAgents(prior)` enumerates agents reachable to this check by reading the upstream `remote.agent-status` Details' `services` slice and filtering to Classification == "active". - `classifyOneAgent` buckets a single agent into fine / underscoped / empty / unknown per the design's pass condition ("project + (account|RG) covered"). `describeOneAgent` renders the one-line per-agent breakdown (`<agent>: project=N, account=M, resource-group=K`, with `?` on probe-error scopes). - `classifyAgentIdentityRolesAggregate` folds per-agent entries into a single doctor Result: all "fine" → Info; any "empty" → Fail (smoking-gun for "every tool call 403s"); worst "underscoped" → Warn; worst "unknown" → Warn. - `makeRealProbeAgentPrincipal` builds the production probe closure (mirrors `makeRealProbeAgentStatus` byte-for-byte apart from the field consumed — `InstanceIdentity.PrincipalID` vs `Status`). ## Renderer additions `StatusInfo` joins the existing Pass/Warn/Fail/Skip status set so the "all agents are fine" case can surface as an informational role listing without flagging the run yellow. - `types.go`: `StatusInfo Status = "info"` + `Summary.Info int` (JSON tag added). - `runner.go`: canonical validation switch + summarize switch extended for Info; `ExitCode` treats Info as a "useful diagnostic completed" status (matches Pass for exit-code purposes). - `doctor_format.go`: glyph "ⓘ" and label "INFO" added to `statusGlyphAndLabel`; `writeSummaryLine` appends ", N info" when Info > 0 (preserves existing test assertions otherwise). ## Dependencies wiring `internal/cmd/doctor/checks_local.go` adds two test seams on `Dependencies`: - `probeAgentPrincipal` — replaces the production `GetAgentVersion` call with a unit-test fake. Same signature shape as `probeAgentStatus`. - `queryAgentIdentityRoles` — replaces the production `project.QueryAgentIdentityRoles` call. Signature mirrors the public API so wiring is a single substitution. `internal/cmd/doctor/checks_remote.go` appends `newCheckAgentIdentityRoles(deps)` after the existing `remote.agent-status` entry in `NewRemoteChecks`. The append-after ordering is load-bearing — every skip-cascade guard in C12 reads `remote.agent-status`'s Result from `prior []Result`, and the local-then-remote ordering invariant remains intact (verified by the existing `TestNewLocalAndRemoteChecks_ProductionCompositionLocalsFirst`). ## Tests `internal/cmd/doctor/checks_agent_identity_roles_test.go` (new, 16 KB): - Skip-cascade gates: nil AzdClient, `remote.agent-status` not Passed, project endpoint missing, no active agents, project-resource-ID unset, project-resource-ID malformed. - Aggregate classification: Info when all fine; Fail when any agent has zero roles; Warn when worst is underscoped; Warn on transient query error. - Per-agent classifier table (six cases: project+account, project+RG, project-only, account-only, all-empty, all-errored). - Detail formatting: scope counts and `?` for probe-error scopes. - Missing-principal degradation: agent with no `instance_identity` surfaces as a warning rather than a fail. - `readActiveAgents` filtering invariants (active-only, missing-name dropped, nil-return on missing upstream). `internal/cmd/doctor/checks_remote_test.go` updated: the `NewRemoteChecks` contract test now pins five entries (auth → foundry-endpoint → rbac → agent-status → agent-identity-roles) with their ID / Name / Remote / Fn invariants. ## Preflight - gofmt -s -w . clean - go vet ./... clean - go build ./... clean - go test ./... -count=1 all green - internal/cmd 14.6s - internal/cmd/doctor 1.6s - internal/cmd/nextstep 3.8s - internal/pkg/agents/agent_api 10.9s - internal/pkg/agents/agent_yaml 1.0s - internal/pkg/azure 12.7s - internal/project 5.5s - golangci-lint run ./internal/cmd/... ./internal/project/... 0 issues - cspell on new files (after "underscoped" added to cspell.yaml) 0 issues - copyright-check.sh on extension clean ## Design notes - The spec at `.tmp/pr-8057/azd-ai-agent-doctor-remote-checks.md` lines 193–223 specifies a per-agent fan-out at three scopes with the "fine" pass condition (project + (account|RG)). Renders as INFO rather than PASS because the design's intent is a diagnostic listing — operators inspect it on `--output json` and confirm no MI is starved; the check should not flip the doctor green on its own. - C12 uses the `wg.Go` Go 1.26 idiom for per-principal fan-out; per-scope probes within one principal run sequentially (3 scopes × 1 ARM listing each is well under budget and avoids the goroutine-per-scope-explosion). - `probeAgentPrincipal` deliberately does NOT extend C17's `probeAgentStatus` surface — extending it would couple two independent checks. The mirror cost is one ~40-line factory function shared by both. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…P5.1 C2) Adds a best-effort manifest walker that surfaces model / toolbox / connection resources from each service's `agent.manifest.yaml` into `nextstep.State`. Unblocks the doctor checks C13 (model deployments), C14 (toolboxes), and C15 (connections), all of which need to know whether the relevant resource kinds are declared before they can decide to run or skip. ## State additions - `State.HasModels`, `State.HasToolboxes`, `State.HasConnections` — aggregate boolean flags. True when at least one matching resource is found across all `azure.ai.agent` services. - `State.ModelRefs`, `State.Toolboxes`, `State.Connections` — sorted `[]ResourceRef` per kind. - `ResourceRef{Name, ServiceName, Detail}` — slim doctor-facing shape. Detail carries the kind-specific identifier (model id, connection `<Category> | <Target>`, empty for toolboxes). ## Walker semantics - File names probed (in order): `agent.manifest.yaml`, `agent.manifest.yml`. `agent.yaml` is deliberately excluded — it describes the container, not declared resources. - Uses `agent_yaml.ExtractResourceDefinitions` directly (NOT `LoadAndValidateAgentManifest`) so a manifest with an absent / partial `template` block — common during init — still surfaces its `resources:` declarations. - Best-effort: missing file, unreadable bytes, malformed YAML, zero resources, and unknown resource kinds all silently degrade (Has* flags stay false; lists stay nil). Walker never adds to the `errs` slice so a manifest-in-flight (which init re-writes mid-flow) never blocks the rest of state assembly. - Dedup key is `(ServiceName, Name)`. Same name twice in one service collapses to one entry (first-occurrence wins, matching agent_yaml's manifest semantics). Same name under two services surfaces twice so per-service doctor failures remain individually addressable. - Result slices are sorted by `(Name, ServiceName)` so doctor output snapshots and downstream renderers are deterministic. ## Why this is its own commit The walker is a pure data-collection step with no resolver-side consumers in this commit. Doctor checks C13/C14/C15 (following commits) gate-skip themselves on `state.Has{Models,Toolboxes, Connections}` and iterate the matching ref slice. Landing the walker first keeps each downstream commit focused on its single check. ## Tests 8 new tests in `manifest_test.go`: - All three kinds present → flags + lists populated, sort order + detail formatting locked. - Missing manifest → silent, no errors logged through the walker. - Malformed YAML → silent, no errors. - Manifest with no `resources:` key → silent, flags false. - Multi-service aggregation → entries sorted by Name, ties broken by ServiceName. - Duplicate `(service, name)` within one manifest → first occurrence wins. - `.yaml` wins over `.yml` when both exist. - `agent.yaml` (not a manifest) is ignored even if its content happens to parse as one. - `connectionDetail` table-driven test covers all four category/target combinations. ## Preflight - `gofmt -s -w .` — clean - `go vet ./...` — clean - `go test ./... -count=1` — full extension suite green - `golangci-lint run ./internal/cmd/...` — 0 issues - `cspell` over the touched files — 0 issues Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds the seventh doctor remote check (`remote.model-deployments`) which verifies that every model resource declared in any service's `agent.manifest.yaml` (collected by the C2 manifest walker into `State.ModelRefs`) has a corresponding Cognitive Services deployment on the Foundry project's underlying account. # What it does For each project run: 1. Skip-cascade gates (in order): `AzdClient` nil → `local.environment-selected` → `local.azure-yaml` / `local.agent-service-detected` → `remote.auth` → `remote.foundry-endpoint` → `!state.HasModels` → `AZURE_AI_PROJECT_ID` unreadable / cannot be parsed. Every gate produces a single, actionable Skip message that points the user at the upstream check. 2. Parse `AZURE_AI_PROJECT_ID` (Foundry project ARM resource ID) for `(subscription, resourceGroup, accountName)` via the new `parseAccountFromProjectID` helper. Deployments live at the Cognitive Services *account* level, NOT the project level, so the account name is the load-bearing parameter here. 3. Issue exactly one `armcognitiveservices/v2.DeploymentsClient .NewListPager` round trip via the new `realProbeModelDeployments` helper, capped at 10s (matches the design's per-probe budget in `.tmp/pr-8057/azd-ai-agent-doctor-remote-checks.md`). Returns `[]string` of deployment names. Transport errors short-circuit the check to Skip with the error verbatim plus an actionable retry suggestion — we can not distinguish "deployment missing" from "ARM unreachable" without a successful round trip. 4. `classifyModelDeployments` joins `state.ModelRefs` to the deployment set on name. All match → Pass with the matched count. One or more missing → Fail with the missing names listed in the Message and structured under `Details["missingModels"]` (each entry carries both `name` and `service` so the user can locate the offending manifest entry). Suggestion: `azd provision` to create the missing deployments, or update `agent.manifest.yaml` `resources[].name` to match deployments that already exist. # Aggregation The walker may surface `ModelRefs` from multiple services. Every service in an azd project belongs to the same Foundry project (and therefore the same Cognitive Services account), so the check issues exactly one deployments list per run regardless of how many services / model refs exist. The same model referenced by two services collapses to a single match check; a missing model referenced by two services surfaces as two `missingModels` entries (one per service) so the user can pinpoint each affected manifest. # Test seam `Dependencies.probeModelDeployments` (lowercase, package-internal) matches the established pattern from `probeAuth`, `probeFoundryEndpoint`, `probeDeveloperRBAC`, `probeAgentStatus`, `probeAgentPrincipal`. Production wiring leaves it nil; tests inject a closure that returns canned `(names, err)` tuples and (optionally) captures the `(subscription, resourceGroup, accountName)` it was called with. `Dependencies.assembleState` and `Dependencies.readProjectResourceIDFn` are reused from earlier checks; no new top-level seam is added besides the probe. # Files - `internal/cmd/doctor/checks_model_deployments.go` — new check factory `newCheckModelDeployments`, `parseAccountFromProjectID`, `classifyModelDeployments`, `realProbeModelDeployments`, `listDeploymentNames`. 363 lines. - `internal/cmd/doctor/checks_model_deployments_test.go` — 11 tests: skip-cascade (1 + table of 5 upstream-blocked permutations), no manifest models, unset project ID, unparsable project ID, probe transport error, all-match Pass, partial-match Fail, all-missing Fail, parser table (canonical / mixed-case / missing segments / garbage), factory shape pin. - `internal/cmd/doctor/checks_local.go` — adds the `probeModelDeployments` field to `Dependencies` next to its same-shape siblings. - `internal/cmd/doctor/checks_remote.go` — appends `newCheckModelDeployments` after `newCheckAgentIdentityRoles` in `NewRemoteChecks`. - `internal/cmd/doctor/checks_remote_test.go` — extends the composition pin test to assert 6 checks (was 5) including the new `remote.model-deployments` slot. # Preflight - `gofmt -s -w .` clean. - `go vet ./...` clean. - `go build ./...` clean. - Full extension test suite: green (`cmd`, `cmd/doctor`, `cmd/nextstep`, `exterrors`, `agents/agent_api`, `agents/agent_yaml`, `pkg/azure`, `project` — all pass). - `golangci-lint run ./internal/cmd/doctor/...` 0 issues. - `cspell` 0 issues on production file. - No `go.mod` or `go.sum` changes (uses already-imported `armcognitiveservices/v2`). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds the eighth local doctor check, `local.toolboxes`, which verifies that every ToolboxResource declared in any service's agent.manifest.yaml has its canonical MCP endpoint env var (TOOLBOX_<NORMALIZED_NAME>_MCP_ENDPOINT) set in the active azd environment. Why local (Remote: false). The check only reads the active azd environment via the existing gRPC env service — no ARM / Foundry round trips. Tagging it local means `--local-only` still runs it (which is exactly what we want: a missing TOOLBOX_*_MCP_ENDPOINT is diagnosable without network access). Skip cascade. Skips on AzdClient==nil, local.environment-selected, local.azure-yaml, local.agent-service-detected, and when state.HasToolboxes is false. Deliberately NOT gated on remote.auth / remote.foundry-endpoint — a down Foundry must not poison a local env diagnostic. Classification. - All endpoints set → Pass with matchedCount. - One or more missing → Fail with the missing toolbox names + env var keys in the Message, Suggestion points at `azd provision` (the canonical fix) or `azd env set` as the manual override, Details["missingToolboxes"] carries a structured list (name, service, envVar) for JSON consumers. - Env lookup transport error → Fail (NOT Skip). Divergent from C13's model-deployments which Skips on probe error, because env lookup is local; a transport failure here means the user's azd config / extension is broken and a Skip would silently swallow that signal. Suggestion points at `azd env list` / `azd env get-values`. - Empty / whitespace-only value → treated as missing (matches detectMissingVars semantics in nextstep/state.go). Convention. TOOLBOX_<NORMALIZED_NAME>_MCP_ENDPOINT, with name upper-cased and `-` / `.` / ` ` mapped to `_`. Matches the hosted-toolbox Bicep sample output names. The prefix and suffix are pinned in code (not derived from the env) so the Fail message can name the exact env var the user must grep their Bicep template for. Dedup. classifyToolboxEndpoints dedupes on the canonical env key because the C2 manifest walker dedupes on (ServiceName, Name) — the same toolbox referenced by two services would otherwise produce two env lookups and two missing-list entries. Exposed `dedupToolboxKeys` for callers (renderer / future telemetry) that want the expected-key list up front; the classifier does its own inline dedup so it does not depend on this helper. Test seam. `Dependencies.lookupToolboxEnv toolboxEnvLookupFn` matches the established seam pattern (probeAuth, probeFoundryEndpoint, probeModelDeployments). Production wiring leaves it nil; the check binds `makeRealToolboxEnvLookup(deps .AzdClient)` on first call, which calls `client.Environment().GetValue` — the canonical one-key env reader used by service_target_agent.go and checks_rbac.go. Tests (15). Skip cascade (azdClient nil + 3 priors), not-gated-on- remote-priors invariant, state emptiness (no toolboxes / nil state), 3 classifier paths (all-set / partial / all-missing), whitespace-as-missing, transport-error-is-Fail, cross-service dedup, normalizeToolboxName table (8 cases), toolboxEndpointKey roundtrip, dedupToolboxKeys table, factory-shape pin. Wired into NewLocalChecks (now 8 entries); local-checks pin test updated. NewRemoteChecks unchanged (still 6 entries). Preflight clean: gofmt, vet, build, full extension test suite green (cmd 16.7s, doctor 2.9s, nextstep 6.7s, etc.), golangci-lint 0 issues, cspell 0 issues on production files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Issue: Doctor previously had no way to verify that Foundry connections referenced by agent manifests (e.g. `bing-grounding`, key-vault-backed auth connections) actually exist on the project. Failure surfaced later at invoke time as 401/403 from the upstream tool, with no clear path back to the missing connection. Approach: New `remote.connections` doctor check that enumerates manifest ConnectionResource entries (already discovered by the C2 manifest walker into `state.Connections`), calls `FoundryProjectsClient.GetAllConnections(ctx)` on the active project, and reports any manifest-declared connection that isn't present on the project as a Fail. Missing-entry rendering format: `<name> [<detail>] (service <svc>)` when Detail is non-empty, falling back to `<name> (service <svc>)` to avoid a bare `[]`. Detail typically renders as `<Category> | <Target>` for connections (types.go:183-189; manifest.go:152-162). Classification: REMOTE check (Remote: true). Calls the Foundry API. Skip cascade mirrors C13 (`remote.model-deployments`): AzdClient → environment-selected → azure.yaml / agent-service-detected → remote.auth → remote.foundry-endpoint → !state.HasConnections → unparsable project ID Probe error → Skip (matching C13's pattern — distinguishing transport failure from "missing connection" requires a successful round-trip). 10s probe timeout. Wiring: `newCheckConnections(deps)` added as the 7th and final entry in `NewRemoteChecks` (after C12 agent-identity-roles, C13 model- deployments). Pin test `TestNewRemoteChecks_HasAuthFoundryEndpointRBAC AgentStatusIdentityRolesModelDeployments` renamed to `...ModelDeploymentsConnections`, Len bumped 6 → 7, 4 new index-6 assertions for ID / Title / Description / Remote. Test seam: New `probeFoundryConnections` field appended to `Dependencies` matching the existing seam pattern from `probeModelDeployments` (C13). Production wiring uses `realProbeFoundryConnections` which constructs a credential via `azidentity.NewAzureDeveloperCLICredential` (matching the rest of the extension per `agent_context.go:101-109`). New helper: `parseAccountProjectFromProjectID(projectID) (account, project, err)` — sibling of C13's `parseAccountFromProjectID`. Returns two segments instead of the four C13 needs; kept separate to avoid churning C13's signature for a single new caller. Both case-insensitive on segment markers. Follow-up: consolidate into a single parser when a third caller appears. Tests: `checks_connections_test.go` — 13 tests mirroring C13 patterns: - Skip cascade table (5 rows: AzdClient, environment, azure.yaml / agent-service-detected, auth, foundry-endpoint). - State emptiness (HasConnections false → Skip). - Project ID unset → Skip. - Project ID unparsable → Skip. - Probe error → Skip. - All-match → Pass. - Partial mismatch → Fail with missing names + service tags. - All-missing → Fail. - Empty-Detail rendering omits `[]`. - Parser table (5 cases: canonical, mixed case, missing project, missing account, garbage). - Factory shape pin (Remote: true, ID, Title, Description). Preflight: - gofmt -s -w . (clean) - go vet ./... (clean) - go build ./... (clean) - go test ./... -count=1 (all packages pass; doctor 5.474s) - golangci-lint run ./internal/cmd/doctor/... (0 issues) - cspell lint "internal/cmd/doctor/*.go" (14 files, 0 issues) - copyright header verified on both new files Files: - internal/cmd/doctor/checks_connections.go (NEW, +332) - internal/cmd/doctor/checks_connections_test.go (NEW, +326) - internal/cmd/doctor/checks_local.go (probeFoundryConnections seam +5) - internal/cmd/doctor/checks_remote.go (wire +newCheckConnections +1) - internal/cmd/doctor/checks_remote_test.go (pin test 6 → 7, +14) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Two reviewer-consensus findings from the batched code review of commits 385b147 (C13 remote.model-deployments), 87e1dcc (C14 local.toolboxes), and 1b3143f (C15 remote.connections): Fix 1 (MEDIUM, Opus + GPT-5.5): toolbox env-key normalizer divergence. C14's `normalizeToolboxName` only mapped `-`, `.`, and ` ` to `_` rune-by-rune, while the production helpers `init.go:toolboxMCPEndpointEnvKey` (manifest injection) and `listen.go:toolboxMCPEndpointEnvKey` (runtime env write) both use the regex `[^A-Z0-9]+` -> `_` (run-collapsing, all non-alphanumerics). The two algorithms agreed only on the subset of inputs the test table exercised (`web-search-tools`, `my.toolbox.v2`, `my toolbox`, ...) and diverged on inputs like `my--tool`, `my+tool`, `my:tool`, `my(tool)`, `my\ttool`. A user with such a toolbox name would see the doctor flag a missing endpoint under a key (`TOOLBOX_MY__TOOL_MCP_ENDPOINT`, `TOOLBOX_MY+TOOL_MCP_ENDPOINT`, ...) that nothing in the system ever writes. Resolution: hoist the canonical helper into a shared `internal/pkg/envkey` package so the producing and diagnostic sides cannot drift again. - new internal/pkg/envkey/envkey.go -- `ToolboxMCPEndpoint` - new internal/pkg/envkey/envkey_test.go -- 13 cases incl. double-hyphen run, `+`, `:`, `/`, tab, parens, empty - internal/cmd/listen.go -- drop local helper, drop `regexp` import, route through envkey - internal/cmd/init.go -- route through envkey - internal/cmd/init_test.go -- delete duplicated table (covered by envkey package test) - internal/cmd/doctor/checks_toolboxes.go -- drop local normalizeToolboxName / toolboxEndpointKey /toolbox{Prefix, Suffix}, route 2 callsites through envkey - internal/cmd/doctor/checks_toolboxes_test.go -- replace the normalize-table test with a thin pin test verifying the doctor's renderer helper routes through envkey - cspell.yaml -- allowlist `envkey` Fix 2 (MEDIUM, Sonnet): assembler errors silently swallowed. C13/C14/C15 all used `state, _ := assembler(...)` and reported a `Skip` with "no X declared in any service's agent.manifest.yaml" whenever `state == nil || !state.HasX`. The existing pattern at `checks_manual_env.go:95-109` instead captures `errs` and Fails with the actual cause when `state == nil` (defensive against a future contract change where AssembleState may return a nil state with a populated errs slice). Resolution: mirror the established pattern in all three new checks. The Skip for `!state.HasX` is preserved; only the `state == nil` branch becomes a Fail surfacing `errs[0].Error()`. - checks_model_deployments.go -- Fail-on-nil with cause - checks_toolboxes.go -- Fail-on-nil with cause - checks_connections.go -- Fail-on-nil with cause - checks_model_deployments_test.go -- new test: nil state surfaces errs[0] - checks_toolboxes_test.go -- update existing `SkipsWhenAssemblerReturnsNil` to `FailsWhenAssembler ReturnsNilState` plus new test asserting errs[0] surfaces in the Fail message - checks_connections_test.go -- new test: nil state surfaces errs[0] Not addressed (deferred): LOW (GPT-5.5): `parseAccountProjectFromProjectID` (C15) accepts partial paths; `parseAccountFromProjectID` (C13) does not. Opus reviewed and called the dual-parser duplication "defensible for two callers"; commit 1b3143f's message already notes the follow-up to consolidate when a third caller appears. Preflight: - gofmt -s -w . (clean) - go build ./... (clean) - go vet ./... (clean) - go test ./... -count=1 (all packages pass; envkey 1.837s, doctor 6.276s, cmd 14.401s) - golangci-lint run ./... (0 issues) - cspell lint <new+touched> (17 files, 0 issues) Files (10): - internal/pkg/envkey/envkey.go (NEW) - internal/pkg/envkey/envkey_test.go (NEW) - internal/cmd/listen.go (MOD) - internal/cmd/init.go (MOD) - internal/cmd/init_test.go (MOD) - internal/cmd/doctor/checks_toolboxes.go (MOD) - internal/cmd/doctor/checks_toolboxes_test.go (MOD) - internal/cmd/doctor/checks_model_deployments.go (MOD) - internal/cmd/doctor/checks_model_deployments_test.go (MOD) - internal/cmd/doctor/checks_connections.go (MOD) - internal/cmd/doctor/checks_connections_test.go (MOD) - cspell.yaml (MOD) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When Doctor/post-deploy guidance has no cached OpenAPI-derived sample payload but a service README is present, don't suggest a concrete protocol-generic payload that may fail for that sample's schema. Emit the README pointer first, then an invoke command with an explicit '<payload>' placeholder. Cached OpenAPI payloads still produce runnable invoke commands, and services without a README still get the protocol-generic fallback payload with a generic label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Active/idle show output should stay an inspection view of the hosted agent resource. Remove the active-state invoke suggestion from ResolveAfterShow and avoid attaching next_step in show JSON when the agent is already healthy. Non-active states keep actionable guidance: - creating -> monitor --type system --follow - failed/empty -> monitor --follow - deleting/deleted -> azd deploy - unknown -> azd ai agent show <service> This also avoids state/OpenAPI assembly work for active show output because no active-state guidance is rendered. Validation: - go test ./internal/cmd ./internal/cmd/nextstep -run 'TestResolveAfterShow|TestResolveNextStepFromStatus|TestShowResultJSON|TestPrintAgentVersionJSON|TestPrintAgentVersionTable|TestResolveAfterInvoke_Success|TestResolveAfterInit_UnresolvedPlaceholders|TestResolveAfterRun' -count=1 - go test ./internal/cmd/... -count=1 - go vet ./internal/cmd/... - golangci-lint run ./internal/cmd/... - cspell lint touched show/nextstep files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Stream text-mode doctor output by observing each finalized check result, while keeping JSON output buffered and unchanged. Split the text formatter into header/check/footer pieces so the streaming path preserves the existing report shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove unnecessary explanatory comments across the Azure AI Agents extension and shorten the remaining comments to focus on non-obvious contracts and behavior. Also drops the unused .tmp/ ignore entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Remove the reserved doctor --output flag path, apply go fix modernizations, and reword cspell-sensitive text. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Restore comments that predated this PR in existing files so the cleanup only affects new or changed comment surfaces. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 73 out of 73 changed files in this pull request and generated 10 comments.

+	state, _ := nextstep.AssembleStateFromSource(ctx, nextstep.NewSource(azdClient))
+	if len(state.Services) == 0 {
+		// Avoid repeating the missing-service guidance already reported by
+		// `local.agent-service-detected`.
+		return nil
+	}


+			// os.Exit skips defers, so do not add cleanup-critical defers here.
+			code := doctor.ExitCode(report)
+			if code == 0 {
+				return nil
+			}
+			os.Exit(code)
+			return nil // unreachable


 	if info, err := resolveAgentServiceFromProject(ctx, azdClient, rc.name, a.noPrompt); err == nil {
-		if rc.name == "" && info.AgentName != "" {
+		rc.serviceName = info.ServiceName
+		if info.AgentName != "" {
 			rc.name = info.AgentName
 		}


+		"creating":                       string(AgentVersionCreating),
+		"active":                         string(AgentVersionActive),
+		"failed":                         string(AgentVersionFailed),
+		"deleting":                       string(AgentVersionDeleting),
+		"deleted":                        string(AgentVersionDeleted),


+	curr := parsePendingProvisionReasons(priorRaw)
+	next := parsePendingProvisionReasons(formatPendingProvisionReasons(mutate(curr)))
+	formatted := strings.Join(next, ",")
+	if formatted == priorRaw {
+		return next, nil
+	}


+			name: "indented output (inside a conditional block) accepted",
+			in: `if (condition) {
+  output INDENTED_OUTPUT string = 'value'
+}
+`,
+			want: []string{"INDENTED_OUTPUT"},
+		},


+// dedupToolboxKeys returns the canonical env keys for the given toolboxes.
+func dedupToolboxKeys(toolboxes []nextstep.ResourceRef) []string {
+	seen := make(map[string]struct{}, len(toolboxes))
+	keys := make([]string, 0, len(toolboxes))
+	for _, t := range toolboxes {
+		key := envkey.ToolboxMCPEndpoint(t.Name)
+		if _, ok := seen[key]; ok {
+			continue
+		}
+		seen[key] = struct{}{}
+		keys = append(keys, key)
+	}
+	slices.Sort(keys)
+	return keys
+}


+func printDoctorReportText(
+	w io.Writer,
+	report doctor.Report,
+	trailing []nextstep.Suggestion,
+	showNext bool,
+) error {
+	if err := printDoctorReportTextHeader(w); err != nil {
+		return err
+	}
+
+	for _, c := range report.Checks {
+		if err := writeCheckLines(w, c); err != nil {
+			return err
+		}
+	}
+
+	return printDoctorReportTextFooter(w, report, trailing, showNext)
+}


 	var result map[string]any
 	if err := json.Unmarshal(respBody, &result); err != nil {
 		// Not JSON — just print raw response
 		fmt.Println(string(respBody))
+		a.emitInvokeSuccessNextStep(nextstep.InvokeLocal, "")
 		return nil
 	}

-	return printAgentResponse(result, "local")
+	if err := printAgentResponse(result, "local"); err != nil {
+		return err
+	}
+	a.emitInvokeSuccessNextStep(nextstep.InvokeLocal, "")
+	return nil


+	// AgentVersionIdle is a defensive synonym for "active" — the issue
+	// #7975 spec lists `idle`, although the verified enum only emits `active`.
+	AgentVersionIdle     AgentVersionStatus = "idle"
+	AgentVersionFailed   AgentVersionStatus = "failed"
+	AgentVersionDeleting AgentVersionStatus = "deleting"
+	AgentVersionDeleted  AgentVersionStatus = "deleted"
+)


Add project-root validation for service-relative file probes and reads, including symlink-aware containment checks and root-service handling.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove the unused nextstep auth probe option and state fields that were added by the PR but never consumed by the resolver or doctor wiring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Route PR-added nextstep stdout emission through one cmd helper so TTY gating stays consistent across init, invoke, run, show, and doctor text output. This intentionally applies the nextstep call-site TTY contract to the PR-added init next-step blocks as well, keeping redirected output free of human-only guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jongio

Solid foundation for the doctor + nextstep subsystems - clean module boundaries, interface-driven testability, and well-structured check pipeline. A few things to sort out before this leaves draft:

invoke.go:541-543 is the one I'd prioritize - it silently changes CLI semantics for --name. The rest are lower-stakes but worth a look.

jongio · 2026-05-17T19:25:53Z

+		rc.serviceName = info.ServiceName
+		if info.AgentName != "" {
 			rc.name = info.AgentName
 		}


This drops the rc.name == "" guard that preserved a user-supplied --name flag. Before this PR, azd ai agent invoke --name foo kept foo as the agent name even when info.AgentName was set. Now info.AgentName always wins, silently overriding the explicit flag.

If that's intentional (e.g. info.AgentName is the canonical source), a log.Printf when the override happens would help users understand why their --name was ignored. If it's unintentional, restoring the guard is a one-liner:

Suggested change

}

if rc.name == "" && info.AgentName != "" {

jongio · 2026-05-17T19:25:53Z

+			if code == 0 {
+				return nil
+			}
+			os.Exit(code)


os.Exit(code) skips the deferred logCleanup() and azdClient.Close() above. For non-zero exits the log file won't be cleaned up and the client connection leaks.

One fix - return the exit code up to the caller and let Cobra handle the exit, or call the cleanup explicitly before os.Exit:

if code != 0 { logCleanup() if azdClient != nil { azdClient.Close() } os.Exit(code) }

jongio · 2026-05-17T19:25:53Z

+				Details: map[string]any{
+					"validForMinutes": minutes,
+					"upn":             res.upn,
+				},


The Options parameter is unused (_ on line 104), so UPN ends up in Details and Message regardless of --unredacted. If a user runs azd ai agent doctor without --unredacted, they'd still see their UPN in the output.

Either wire opts.Unredacted through and conditionally redact, or drop the UPN from the default output and only include it when --unredacted is set.

jongio · 2026-05-17T19:25:54Z

+		return nil
+	}
+
+	state, _ := nextstep.AssembleStateFromSource(ctx, nextstep.NewSource(azdClient))


The error from AssembleStateFromSource is discarded here. The state is always non-nil (so no crash risk), but if assembly partially fails the trailing Next block will be based on incomplete data without any indication to the user.

Consider logging the errors at debug level so --debug runs surface what went wrong:

state, errs := nextstep.AssembleStateFromSource(ctx, nextstep.NewSource(azdClient)) for _, e := range errs { log.Printf("doctor trailing state: %v", e) }

Copilot AI review requested due to automatic review settings May 15, 2026 04:00

antriksh30 requested review from JeffreyCA, hemarina, jongio, rajeshkamal5050, tg-msft, therealjohn, trangevi, trrwilson, vhvb1989, wbreza and weikanglim as code owners May 15, 2026 04:00

Copilot AI reviewed May 15, 2026

View reviewed changes

antriksh30 changed the title ~~Azure AI Agents: add next-step guidance and doctor diagnostics~~ [WIP] Azure AI Agents: add next-step guidance and doctor diagnostics May 15, 2026

microsoft-github-policy-service Bot assigned antriksh30 May 15, 2026

Antriksh Jain and others added 15 commits May 15, 2026 09:31

chore: ignore local design-spec mirror folder (.tmp/)

75898a0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Antriksh Jain and others added 16 commits May 15, 2026 09:42

antriksh30 force-pushed the antrikshjain/next-step-implementation branch from 401ca65 to 829063a Compare May 15, 2026 04:17

antriksh30 requested a review from Copilot May 15, 2026 04:55

Copilot AI reviewed May 15, 2026

View reviewed changes

Antriksh Jain and others added 2 commits May 15, 2026 10:28

azd ai agents: fix doctor CI failures

5f125d3

Remove the reserved doctor --output flag path, apply go fix modernizations, and reword cspell-sensitive text. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azd ai agents: restore existing comments

d05b2ee

Restore comments that predated this PR in existing files so the cleanup only affects new or changed comment surfaces. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

antriksh30 requested a review from Copilot May 15, 2026 05:19

Copilot started reviewing on behalf of antriksh30 May 15, 2026 05:20 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Antriksh Jain and others added 3 commits May 15, 2026 12:18

azd ai agents: validate service-relative paths

1988c3e

Add project-root validation for service-relative file probes and reads, including symlink-aware containment checks and root-service handling.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azd ai agents: remove unused nextstep auth state

21eada5

Remove the unused nextstep auth probe option and state fields that were added by the PR but never consumed by the resolver or doctor wiring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jongio reviewed May 17, 2026

View reviewed changes

antriksh30 marked this pull request as ready for review May 18, 2026 04:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Azure AI Agents: add next-step guidance and doctor diagnostics#8198

[WIP] Azure AI Agents: add next-step guidance and doctor diagnostics#8198
antriksh30 wants to merge 81 commits into
Azure:mainfrom
antriksh30:antrikshjain/next-step-implementation

antriksh30 commented May 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

jongio left a comment

Uh oh!

jongio May 17, 2026

Uh oh!

jongio May 17, 2026

Uh oh!

jongio May 17, 2026

Uh oh!

jongio May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

antriksh30 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

jongio May 17, 2026

Choose a reason for hiding this comment

Uh oh!

jongio May 17, 2026

Choose a reason for hiding this comment

Uh oh!

jongio May 17, 2026

Choose a reason for hiding this comment

Uh oh!

jongio May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antriksh30 commented May 15, 2026 •

edited

Loading