ci: run evals in CI with Gemini, add nightly schedule and failure notifications by WilliamBergamin · Pull Request #70 · slackapi/slack-mcp-plugin

WilliamBergamin · 2026-07-02T21:12:53Z

Summary

Migrates the LLM-judged eval suite from a local Ollama judge to Google's Gemini free-tier API , and makes the CI run the evals

The eval job was added but had no secrets wired up, so TestToolSelection — gated on SLACK_MCP_TOKEN, the only test in tests/eval/ — skipped entirely and the job went green having evaluated nothing. This PR:

Wires GEMINI_API_KEY + SLACK_MCP_TOKEN into the eval job's env:
Adds a nightly schedule so regressions surface independent of PR traffic.
Adds a notifications job that posts to Slack when lint/test/eval fail on main.

Preview

N/A — CI/tooling change, no user-facing UI.

Testing

make lint
make test-unit
ruff format --check tests/support/mcp.py
make test-eval

NOTE: get your own key with https://ai.google.dev/gemini-api/docs/generate-content/get-started

Notes

No changeset: every change here is dev/test/CI infra, not user-facing plugin behavior.

Requirements

I've read and understood the Contributing Guidelines and have done my best effort to follow them.
I've read and agree to the Code of Conduct.
I've run make test and the tests pass.

…ifications Wire the eval job's GEMINI_API_KEY and SLACK_MCP_TOKEN so the LLM-judged tool-selection suite actually runs in CI instead of skipping silently. Fix the mislabeled step name, align the job's action SHAs with lint/test, and raise its timeout to 10m for the rate-limited scenarios. Add a nightly schedule and a regression-notifications job that posts to Slack when lint/test/eval fail on main, mirroring the python-slack-sdk workflow. Also clean up leftovers from the Ollama->Gemini migration: drop the stale "not run in CI" note in AGENTS.md, remove the dangling $(OLLAMA_DIR) from the Makefile clean target, and revert an unrelated reformat in mcp.py. Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>

…ugin into gemini-key

mwbrooks · 2026-07-02T22:25:34Z

    {
        "id": "list-members-platform-team",
-        "prompt": "Who are the members of the #platform-team channel?",
+        "prompt": "Who are the members of the CA1B2C3F5 channel?",


question: Why are we switching the prompt from using a human readable channel name to a channel ID?

If we want to test a real-world prompt, the majority of people (including myself) will ask the MCP to list channels members in #channel-name and not C0123.

mwbrooks

🧪 I'm having trouble getting the tests to run. Here are my steps:

# Clean things up
$ make clean

# Delete the old .env
$ rm .env

# Create a new .env
$ cp .env.example .env

# Add a Gemini API Key and Slack MCP Server API Key
$ vim .env

# Install the dependencies
$ make install

# Run the tests
$ make test-eval

I receive the following errors:

ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[send-message-hello-team] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[read-channel-engineering] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[search-deployment-incident] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[search-channels-mobile] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[read-profile-user] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[list-members-platform-team] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[send-message-release-shipped] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[search-api-migration] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[search-channels-design-system] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-cli-socket-mode] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-block-kit-modal] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-create-app-template] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-post-message-deploy] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-list-members-platform] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-pull-history-engineering] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-user-info-profile] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-add-reaction-releases] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-reply-in-thread] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-read-thread-replies] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-lookup-user-by-email] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[ambiguous-schedule-message-standup] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-api-scopes] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-api-which-method-topic] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-api-pagination] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-api-missing-scope] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-api-docs-url] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-api-rate-limit] - urllib.error.HTTPError: HTTP Error 401: Unauthorized
ERROR tests/eval/test_tool_selection.py::TestToolSelection::test_tool_selection[skill-slack-api-call-with-curl] - urllib.error.HTTPError: HTTP Error 401: Unauthorized

For sanity, I also exported the API keys into my session in case the .env is not being loaded.

mwbrooks · 2026-07-02T22:45:46Z

    id: str
    prompt: str
    expected_tool: str
+    acceptable_tools: NotRequired[list[str]]


question: I understand that acceptable_tools are not required, but is expected_tool still required for the test to pass?

WilliamBergamin and others added 9 commits June 29, 2026 17:02

tests: use gemini instead of ollama

4037e24

add github action

08a5d8e

make evals save tokens

e9800a7

get things working more

47cfd53

Merge branch 'main' into gemini-key

807bbcf

Merge branch 'main' into gemini-key

4b47670

fix: ci fails if eval dependencies not found

9fe2b8a

Merge branch 'gemini-key' of https://github.com/slackapi/slack-mcp-pl…

45af4a9

…ugin into gemini-key

WilliamBergamin self-assigned this Jul 2, 2026

WilliamBergamin added the test Improve of update the tests of this project label Jul 2, 2026

Merge branch 'main' into gemini-key

4167c6f

mwbrooks reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: run evals in CI with Gemini, add nightly schedule and failure notifications#70

ci: run evals in CI with Gemini, add nightly schedule and failure notifications#70
WilliamBergamin wants to merge 10 commits into
mainfrom
gemini-key

WilliamBergamin commented Jul 2, 2026 •

edited

Loading

Uh oh!

mwbrooks Jul 2, 2026

Uh oh!

mwbrooks left a comment

Uh oh!

mwbrooks Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

WilliamBergamin commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Preview

Testing

Notes

Requirements

Uh oh!

mwbrooks Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

mwbrooks left a comment

Choose a reason for hiding this comment

Uh oh!

mwbrooks Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WilliamBergamin commented Jul 2, 2026 •

edited

Loading