Commit f221f6c
authored
test: evals audit datasets to dev-facing cases (#16433)
# Overview
Trims `test/evals/datasets/` so the eval suites measure knowledge a
developer applies while building an application with Payload, not
knowledge a Payload-monorepo contributor needs. Also adds shorthand npm
scripts for running individual eval suites.
## Key Changes
- **Trimmed `conventions/qa.ts` from 10 cases to 1**
- Dropped 9 cases lifted from `CLAUDE.md` (types vs interfaces, boolean
naming, function vs class, translation paths, `afterEach` cleanup,
conventional-commits, dev-server flags, auto-login creds,
single-object-parameter convention).
- Kept the `payload.logger.error` shape case, the only one that
describes a call shape a Payload consumer writes in their own code.
- **Removed `plugins/official/qa.ts`**
- 11 reference-doc QA cases ("what does plugin X do") testing recall,
not application. The borderline MCP-config case is already covered by
`plugins/official/codegen.ts` via real code generation.
- `eval.official-plugins.spec.ts` updated to drop the QA registration;
codegen registration unchanged.
- **Corrected the audience map in `EvalDashboard/audience.ts`**
- `negative` retagged from `maintainers` to `users`. Six of seven
retained `negative` cases are dev-facing (debugging your own broken
config); the map can't split sub-arrays, so `users` is the better
representative tag.
- Removed three category keys (`commits`, `structure`, `testing`) that
no longer appear in any dataset after the conventions trim.
- **Added `test:eval:<suite>` shorthand scripts**
- One per suite (`building-plugins`, `collections`, `config`,
`conventions`, `fields`, `graphql`, `local-api`, `negative`,
`official-plugins`, `rest-api`). Each delegates to the `:skill` variant,
matching the project-wide default.
## Design Decisions
The dividing line is "would a developer consuming `payload` from npm
encounter this?" If no, the case is contributor-only and removed.
Three pre-existing categories were intentionally kept in scope but
untouched:
- `negative/codegen.ts` `negativeInvalidInstructionDataset` is an
eval-pipeline self-test (it verifies `tsc` rejects bad types) and is
preserved as-is.
- `plugins/qa.ts` and `plugins/codegen.ts` stay because developers may
colocate plugins inside their own project structure.
- Other dead audience-map keys (`'access-control'`, `admin`,
`'building-plugins'`, `conventions`, `hooks`, `'official-plugins'`,
`translations`) were dead before this audit and were left to keep the
diff focused.
`conventions/qa.ts` and `eval.conventions.spec.ts` are kept rather than
deleted so the surviving `coding`-category case still runs as a
registered suite.1 parent b519801 commit f221f6c
5 files changed
Lines changed: 12 additions & 129 deletions
File tree
- test/evals
- components/EvalDashboard
- datasets
- conventions
- plugins/official
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| 125 | + | |
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
| 129 | + | |
128 | 130 | | |
129 | 131 | | |
130 | 132 | | |
| 133 | + | |
131 | 134 | | |
132 | 135 | | |
133 | 136 | | |
| 137 | + | |
134 | 138 | | |
135 | 139 | | |
136 | 140 | | |
| 141 | + | |
137 | 142 | | |
138 | 143 | | |
139 | 144 | | |
| 145 | + | |
140 | 146 | | |
141 | 147 | | |
142 | 148 | | |
| 149 | + | |
143 | 150 | | |
144 | 151 | | |
145 | 152 | | |
146 | 153 | | |
| 154 | + | |
147 | 155 | | |
148 | 156 | | |
149 | 157 | | |
| 158 | + | |
150 | 159 | | |
151 | 160 | | |
152 | 161 | | |
153 | 162 | | |
| 163 | + | |
154 | 164 | | |
155 | 165 | | |
156 | 166 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | 16 | | |
18 | 17 | | |
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
22 | 21 | | |
23 | | - | |
| 22 | + | |
24 | 23 | | |
25 | 24 | | |
26 | 25 | | |
27 | | - | |
28 | | - | |
29 | 26 | | |
30 | 27 | | |
31 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | 6 | | |
24 | 7 | | |
25 | 8 | | |
26 | 9 | | |
27 | 10 | | |
28 | 11 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | 12 | | |
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
| 4 | + | |
6 | 5 | | |
7 | 6 | | |
8 | 7 | | |
9 | 8 | | |
10 | 9 | | |
11 | 10 | | |
12 | | - | |
13 | 11 | | |
14 | 12 | | |
0 commit comments