LCORE-2079: added agent skills e2e feature file#1742
Conversation
WalkthroughAdds a Gherkin end-to-end feature file defining agent skills tests: setup/background, REST /tools checks, list/activate/load tool flows (query and streaming), error cases, conversation deduplication, multi-skill discovery, and progressive disclosure scenarios. ChangesAgent Skills E2E Test Suite
🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers:
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/features/skills.feature`:
- Line 7: The feature file contains a hardcoded Authorization header in the step
"And I set the Authorization header to Bearer …"; replace that literal token
with a placeholder backed by an env/fixture (e.g., use a Cucumber parameter or
placeholder like "<AUTH_TOKEN>" or a step that reads process.env.TEST_TOKEN) and
update the test bootstrap/fixture loader to inject a real token at runtime (or
add a dedicated step that loads/sets a token from a fixture or env). Ensure the
step implementation that handles "I set the Authorization header" reads the
placeholder and substitutes the runtime token so no static secrets remain in the
feature text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 987e5347-9fc7-41ee-afd7-775a3c1cba2d
📒 Files selected for processing (1)
tests/e2e/features/skills.feature
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 1
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.{py,feature}
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework for end-to-end testing with Gherkin feature files
Files:
tests/e2e/features/skills.feature
🔇 Additional comments (1)
tests/e2e/features/skills.feature (1)
1-6: LGTM!Also applies to: 8-377
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/features/skills.feature`:
- Around line 19-42: The scenario step "And The body of the response is the
following" in features/skills.feature currently contains placeholder, non-JSON
payloads (Python-like True/None and trailing commas); replace each placeholder
block with real, parseable contract assertions: valid JSON (true/null, no
trailing commas) or explicit Gherkin-friendly key checks (e.g., tools array
length, expected tool.identifier/tool.description/tool.parameters entries), and
ensure assertions cover all actual tool entries (not only skill tools); update
all similar blocks referenced in the comment so the test validates the real
response shape rather than parsing invalid placeholders.
- Around line 13-42: The feature tests for the /tools endpoint use the same
expected response body for both the "Skill tools are registered when skills are
configured" and the complementary scenario, so they don't assert opposite
conditions; update the Scenario "Skill tools are registered when skills are
configured" and the complementary scenario (lines referenced around the scenario
titles and the step 'When I access REST API endpoint "tools" using HTTP GET
method') to assert opposite expectations: one should assert that the response
body includes the specific skill-tool identifier(s) like "filesystem_read" and
their provider_id/toolgroup_id entries, and the other should assert that those
skill-tool identifier(s) are absent (negated assertions) while still validating
the overall response shape and status 200; locate and change the step
implementations that compare the response body so they perform inclusion vs
exclusion checks for the "filesystem_read" identifier (and analogous identifiers
in the other scenario) rather than using the identical placeholder body for both
tests.
- Line 1: Remove the file-level `@skip` annotation at the top of the feature file
(the line containing "`@e2e_group_2` `@skip`") so the entire E2E suite in
skills.feature runs; if certain scenarios still need to be skipped, apply `@skip`
only to those individual Scenario/Scenario Outline blocks instead of the file
header to avoid disabling all tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5f23fca9-e75c-4ea5-ac50-8da87ff190b5
📒 Files selected for processing (1)
tests/e2e/features/skills.feature
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.{py,feature}
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework for end-to-end testing with Gherkin feature files
Files:
tests/e2e/features/skills.feature
| @@ -0,0 +1,432 @@ | |||
| @e2e_group_2 @skip | |||
There was a problem hiding this comment.
Remove file-level @skip; it disables the entire new E2E suite.
With @skip on Line 1, none of these scenarios execute, so this PR adds no effective coverage.
Suggested minimal fix
-@e2e_group_2 `@skip`
+@e2e_group_2📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @e2e_group_2 @skip | |
| `@e2e_group_2` |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/features/skills.feature` at line 1, Remove the file-level `@skip`
annotation at the top of the feature file (the line containing "`@e2e_group_2`
`@skip`") so the entire E2E suite in skills.feature runs; if certain scenarios
still need to be skipped, apply `@skip` only to those individual Scenario/Scenario
Outline blocks instead of the file header to avoid disabling all tests.
| And The body of the response is the following #TODO: Currently placeholder, should reflect actual tools (all tools not just skill tools) | ||
| """ | ||
| { | ||
| "tools": [ | ||
| { | ||
| "identifier": "filesystem_read", | ||
| "description": "Read contents of a file from the filesystem", | ||
| "parameters": [ | ||
| { | ||
| "name": "path", | ||
| "description": "Path to the file to read", | ||
| "parameter_type": "string", | ||
| "required": True, | ||
| "default": None, | ||
| } | ||
| ], | ||
| "provider_id": "model-context-protocol", | ||
| "toolgroup_id": "filesystem-tools", | ||
| "server_source": "http://localhost:3000", | ||
| "type": "tool", | ||
| } | ||
| ], | ||
| } | ||
| """ |
There was a problem hiding this comment.
Replace placeholder expected payloads with real, parseable contract assertions.
Most scenarios still assert TODO placeholders ("content": "bla"), and the blocks are not valid JSON (True/None/trailing commas). Once unskipped, this will either fail parsing or validate the wrong behavior.
Also applies to: 49-72, 87-98, 114-125, 141-152, 168-179, 196-207, 223-234, 249-260, 274-285, 298-309, 323-334, 350-361, 368-379, 395-406, 421-432
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/features/skills.feature` around lines 19 - 42, The scenario step
"And The body of the response is the following" in features/skills.feature
currently contains placeholder, non-JSON payloads (Python-like True/None and
trailing commas); replace each placeholder block with real, parseable contract
assertions: valid JSON (true/null, no trailing commas) or explicit
Gherkin-friendly key checks (e.g., tools array length, expected
tool.identifier/tool.description/tool.parameters entries), and ensure assertions
cover all actual tool entries (not only skill tools); update all similar blocks
referenced in the comment so the test validates the real response shape rather
than parsing invalid placeholders.
|
Ready For Review: |
| {"query": "What skills are available? Use the list_skills tool.", "model": "{MODEL}", "provider": "{PROVIDER}"} | ||
| """ | ||
| Then The status code of the response is 200 | ||
| And The body of the "tool_results" field is #TODO: Currently placeholder, should reflect actual tool results |
There was a problem hiding this comment.
you already add the skills at the top of this scenario, so why not list them as part of this step.
There was a problem hiding this comment.
Im not sure what you mean here? I ensured that the skills were added here but then am just testing to make sure the list_skills tool is used.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
tests/e2e/features/skills.feature (4)
456-470:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDeduplication scenario: first activation should assert
success, notfailure.This scenario tests that re-activating the same skill in the same conversation yields a dedup/failure notice on the second call. The first
activate_skill(Lines 452-455) is the initial activation and should succeed; only the second call at Lines 472-475 should returnfailure/already-loaded. Currently the first block at Line 464 already expects"status": "failure", which contradicts the scenario name ("Duplicate skill activation … returns already-loaded note") and would pass even if dedup logic were broken (e.g., if both calls failed for unrelated reasons).🛠️ Proposed fix
- And The body of the "tool_results" field is `#TODO`: Currently placeholder, should reflect actual tool results + And The body of the "tool_results" field is `#TODO`: Currently placeholder, should reflect actual tool results """ [ { "id": "<call_id>", - "name": "activate_skill" - "status": "failure", + "name": "activate_skill", + "status": "success", "content": "<tool_call content>", "type": "tool_result", "round": 1 } ] """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/features/skills.feature` around lines 456 - 470, The test expectation for the first activation is wrong: update the tool_results expected JSON for the initial activate_skill call so its "status" is "success" (not "failure") and any related "content" or "round" fields match the initial call; ensure the second activation block (the later activate_skill assertion) still expects "failure"/already-loaded. Locate the assertions referencing "tool_results" and the "activate_skill" call id/round in the skills.feature scenario and change only the first block's "status" to "success" to reflect the intended deduplication flow.
173-227: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftMissing coverage: no scenario actually exercises a loaded skill's behavior.
Per prior reviewer feedback, the suite still lacks a scenario that uses a skill after activation (i.e., asks a domain question whose answer depends on the skill's instructions/resources being loaded, and asserts on the assistant response — not just on
tool_results). The progressive-disclosure scenarios at Lines 548-631 chain the three tools but only assert the threetool_resultentries; they don't validate that the model's final answer reflects the skill content. Consider adding at least one scenario that activates a skill and then asks a follow-up question whose expected answer is only achievable with the skill loaded.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/features/skills.feature` around lines 173 - 227, Add a new Gherkin scenario that actually exercises a loaded skill (not just tool_results): start from the same setup used in the existing scenarios (e2e-test-skill directory, lightspeed-stack-skills-auth-noop-token.yaml, restart/service and capture token metrics), then call the query or streaming_query endpoint with a prompt that requires the e2e-test-skill's behavior (a follow-up or domain-specific question whose correct assistant response depends on the skill being loaded) and assert the assistant's final answer/content in the response body (not only the "tool_results" array). Reference the existing scenario names and fields ("list_skills", "tool_results", "streaming_query") and assert that the assistant response contains the expected skill-driven text and that token metrics increased. Ensure the new scenario verifies both the tool invocation and the assistant's final answer reflecting the skill.
494-517:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMulti-skill discovery scenarios don't assert both skills are returned.
The whole point of the
@SkillsMultiConfigscenarios (Lines 494-517 and 519-544) is to verify that bothe2e-test-skillande2e-second-skillare discovered from subdirectories. The current expectedtool_resultsonly contains a single placeholderlist_skillsentry with"<tool_call content>", so the assertion would pass even if only one (or zero) skills were discovered. Populate the expectedcontentwith both skill names so the test actually validates multi-skill discovery.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/features/skills.feature` around lines 494 - 517, Update the assertions for the `@SkillsMultiConfig` scenario so the expected "tool_results" content for the list_skills tool includes both discovered skill names ("e2e-test-skill" and "e2e-second-skill") instead of the single placeholder "<tool_call content>"; edit the Scenario block where the query asks to "List all available skills using the list_skills tool." and replace the placeholder tool_results body to explicitly contain entries (or a single entry payload) that lists both e2e-test-skill and e2e-second-skill to ensure multi-skill discovery is validated.
186-186: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAddress prior reviewer feedback: list the skills as part of the assertion step.
Per earlier review feedback on this PR, the activated/listed skills should be enumerated in the expected
tool_resultscontent here (and in the analogous steps in the multi-skill scenarios at Lines 505 and 532) rather than left as<tool_call content>. The skill set is known at scenario authoring time (e.g.,e2e-test-skill,e2e-second-skill), so populate the expectedcontentaccordingly instead of relying entirely on runtime substitution.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/features/skills.feature` at line 186, Update the Gherkin assertion step "And The body of the \"tool_results\" field is" in tests/e2e/features/skills.feature to replace the placeholder comment with concrete expected tool_result content that enumerates the activated skills (e.g., include entries for "e2e-test-skill" and "e2e-second-skill" where applicable) so the assertion checks for those skill names and their expected outputs; do the same for the analogous multi-skill assertion steps (the other occurrences of the same step in the file) so each expected `content` block explicitly lists the known skills and their expected results instead of leaving `<tool_call content>` or a TODO.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/features/skills.feature`:
- Line 309: Replace the malformed single-line closing token "} ]" inside
the heredoc block with two separate lines (first "}" then "]"), matching the
one-bracket-per-line formatting used elsewhere in
tests/e2e/features/skills.feature; ensure indentation matches surrounding
heredoc JSON and that the overall JSON payload is valid per the broader payload
comment before committing.
---
Outside diff comments:
In `@tests/e2e/features/skills.feature`:
- Around line 456-470: The test expectation for the first activation is wrong:
update the tool_results expected JSON for the initial activate_skill call so its
"status" is "success" (not "failure") and any related "content" or "round"
fields match the initial call; ensure the second activation block (the later
activate_skill assertion) still expects "failure"/already-loaded. Locate the
assertions referencing "tool_results" and the "activate_skill" call id/round in
the skills.feature scenario and change only the first block's "status" to
"success" to reflect the intended deduplication flow.
- Around line 173-227: Add a new Gherkin scenario that actually exercises a
loaded skill (not just tool_results): start from the same setup used in the
existing scenarios (e2e-test-skill directory,
lightspeed-stack-skills-auth-noop-token.yaml, restart/service and capture token
metrics), then call the query or streaming_query endpoint with a prompt that
requires the e2e-test-skill's behavior (a follow-up or domain-specific question
whose correct assistant response depends on the skill being loaded) and assert
the assistant's final answer/content in the response body (not only the
"tool_results" array). Reference the existing scenario names and fields
("list_skills", "tool_results", "streaming_query") and assert that the assistant
response contains the expected skill-driven text and that token metrics
increased. Ensure the new scenario verifies both the tool invocation and the
assistant's final answer reflecting the skill.
- Around line 494-517: Update the assertions for the `@SkillsMultiConfig` scenario
so the expected "tool_results" content for the list_skills tool includes both
discovered skill names ("e2e-test-skill" and "e2e-second-skill") instead of the
single placeholder "<tool_call content>"; edit the Scenario block where the
query asks to "List all available skills using the list_skills tool." and
replace the placeholder tool_results body to explicitly contain entries (or a
single entry payload) that lists both e2e-test-skill and e2e-second-skill to
ensure multi-skill discovery is validated.
- Line 186: Update the Gherkin assertion step "And The body of the
\"tool_results\" field is" in tests/e2e/features/skills.feature to replace the
placeholder comment with concrete expected tool_result content that enumerates
the activated skills (e.g., include entries for "e2e-test-skill" and
"e2e-second-skill" where applicable) so the assertion checks for those skill
names and their expected outputs; do the same for the analogous multi-skill
assertion steps (the other occurrences of the same step in the file) so each
expected `content` block explicitly lists the known skills and their expected
results instead of leaving `<tool_call content>` or a TODO.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 51c93d05-b32d-403c-a41c-b74c7df37543
📒 Files selected for processing (1)
tests/e2e/features/skills.feature
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: build-pr
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.{py,feature}
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework for end-to-end testing with Gherkin feature files
Files:
tests/e2e/features/skills.feature
🔇 Additional comments (2)
tests/e2e/features/skills.feature (2)
1-1: File-level@skipstill disables the entire suite.
188-198: JSON heredoc payloads are not valid JSON (missing commas + trailing commas).Every
tool_resultsblock in this file is missing the comma after the"name": "..."line (e.g., Line 191"name": "list_skills"is not terminated before"status"on Line 192), and most blocks also have a trailing comma after"round": 1,before the closing}(Line 195). The/toolsnegative scenario likewise has a stray],then}at Lines 169-170. Once@skipis removed and the bodies are parsed, every scenario will fail JSON parsing rather than exercise the contract. Apply the same fix to all sibling blocks listed in the previous review (188-198, 215-226, 243-254, 271-282, 300-310, 327-338, 354-365, 380-391, 405-416, 431-442, 459-470, 478-489, 506-517, 533-544, 560-587, 603-631).
| "content": "<tool_call content>", | ||
| "type": "tool_result", | ||
| "round": 1, | ||
| } ] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Formatting glitch: } ] on a single line.
Minor inconsistency vs. the formatting used in every other heredoc block in this file. Once you fix the JSON syntax (per the broader payload comment), please normalize this to one bracket per line as elsewhere.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/features/skills.feature` at line 309, Replace the malformed
single-line closing token "} ]" inside the heredoc block with two separate
lines (first "}" then "]"), matching the one-bracket-per-line formatting used
elsewhere in tests/e2e/features/skills.feature; ensure indentation matches
surrounding heredoc JSON and that the overall JSON payload is valid per the
broader payload comment before committing.
Description
Added E2E feature file for agent skills.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit