Skip to content

Commit f221f6c

Browse files
authored
test: evals audit datasets to dev-facing cases (#16433)
# Overview Trims `test/evals/datasets/` so the eval suites measure knowledge a developer applies while building an application with Payload, not knowledge a Payload-monorepo contributor needs. Also adds shorthand npm scripts for running individual eval suites. ## Key Changes - **Trimmed `conventions/qa.ts` from 10 cases to 1** - Dropped 9 cases lifted from `CLAUDE.md` (types vs interfaces, boolean naming, function vs class, translation paths, `afterEach` cleanup, conventional-commits, dev-server flags, auto-login creds, single-object-parameter convention). - Kept the `payload.logger.error` shape case, the only one that describes a call shape a Payload consumer writes in their own code. - **Removed `plugins/official/qa.ts`** - 11 reference-doc QA cases ("what does plugin X do") testing recall, not application. The borderline MCP-config case is already covered by `plugins/official/codegen.ts` via real code generation. - `eval.official-plugins.spec.ts` updated to drop the QA registration; codegen registration unchanged. - **Corrected the audience map in `EvalDashboard/audience.ts`** - `negative` retagged from `maintainers` to `users`. Six of seven retained `negative` cases are dev-facing (debugging your own broken config); the map can't split sub-arrays, so `users` is the better representative tag. - Removed three category keys (`commits`, `structure`, `testing`) that no longer appear in any dataset after the conventions trim. - **Added `test:eval:<suite>` shorthand scripts** - One per suite (`building-plugins`, `collections`, `config`, `conventions`, `fields`, `graphql`, `local-api`, `negative`, `official-plugins`, `rest-api`). Each delegates to the `:skill` variant, matching the project-wide default. ## Design Decisions The dividing line is "would a developer consuming `payload` from npm encounter this?" If no, the case is contributor-only and removed. Three pre-existing categories were intentionally kept in scope but untouched: - `negative/codegen.ts` `negativeInvalidInstructionDataset` is an eval-pipeline self-test (it verifies `tsc` rejects bad types) and is preserved as-is. - `plugins/qa.ts` and `plugins/codegen.ts` stay because developers may colocate plugins inside their own project structure. - Other dead audience-map keys (`'access-control'`, `admin`, `'building-plugins'`, `conventions`, `hooks`, `'official-plugins'`, `translations`) were dead before this audit and were left to keep the diff focused. `conventions/qa.ts` and `eval.conventions.spec.ts` are kept rather than deleted so the surviving `coding`-category case still runs as a registered suite.
1 parent b519801 commit f221f6c

5 files changed

Lines changed: 12 additions & 129 deletions

File tree

package.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,35 +122,45 @@
122122
"test:e2e:prod:run:noturbo": "pnpm runts ./test/runE2E.ts --prod --no-turbo",
123123
"test:eval": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 vitest --run --project eval",
124124
"test:eval:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 vitest --run --project eval",
125+
"test:eval:building-plugins": "pnpm run test:eval:building-plugins:skill",
125126
"test:eval:building-plugins:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.building-plugins.spec",
126127
"test:eval:building-plugins:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.building-plugins.spec",
127128
"test:eval:building-plugins:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.building-plugins.spec",
129+
"test:eval:collections": "pnpm run test:eval:collections:skill",
128130
"test:eval:collections:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.collections.spec",
129131
"test:eval:collections:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.collections.spec",
130132
"test:eval:collections:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.collections.spec",
133+
"test:eval:config": "pnpm run test:eval:config:skill",
131134
"test:eval:config:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.config.spec",
132135
"test:eval:config:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.config.spec",
133136
"test:eval:config:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.config.spec",
137+
"test:eval:conventions": "pnpm run test:eval:conventions:skill",
134138
"test:eval:conventions:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.conventions.spec",
135139
"test:eval:conventions:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.conventions.spec",
136140
"test:eval:conventions:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.conventions.spec",
141+
"test:eval:fields": "pnpm run test:eval:fields:skill",
137142
"test:eval:fields:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.fields.spec",
138143
"test:eval:fields:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.fields.spec",
139144
"test:eval:fields:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.fields.spec",
145+
"test:eval:graphql": "pnpm run test:eval:graphql:skill",
140146
"test:eval:graphql:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.graphql.spec",
141147
"test:eval:graphql:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.graphql.spec",
142148
"test:eval:graphql:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.graphql.spec",
149+
"test:eval:local-api": "pnpm run test:eval:local-api:skill",
143150
"test:eval:local-api:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.local-api.spec",
144151
"test:eval:local-api:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.local-api.spec",
145152
"test:eval:local-api:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.local-api.spec",
146153
"test:eval:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 vitest --run --project eval",
154+
"test:eval:negative": "pnpm run test:eval:negative:skill",
147155
"test:eval:negative:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.negative.spec",
148156
"test:eval:negative:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.negative.spec",
149157
"test:eval:negative:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.negative.spec",
158+
"test:eval:official-plugins": "pnpm run test:eval:official-plugins:skill",
150159
"test:eval:official-plugins:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.official-plugins.spec",
151160
"test:eval:official-plugins:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.official-plugins.spec",
152161
"test:eval:official-plugins:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.official-plugins.spec",
153162
"test:eval:report": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 vitest --run --project eval --reporter=default --reporter=html --outputFile.html=test/evals/eval-results/report.html",
163+
"test:eval:rest-api": "pnpm run test:eval:rest-api:skill",
154164
"test:eval:rest-api:baseline": "cross-env EVAL_VARIANT=baseline NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.rest-api.spec",
155165
"test:eval:rest-api:low-power": "cross-env EVAL_VARIANT=low-power NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.rest-api.spec",
156166
"test:eval:rest-api:skill": "cross-env NODE_OPTIONS=\"--no-deprecation --no-experimental-strip-types\" NODE_NO_WARNINGS=1 pnpm exec vitest --run --project eval eval.rest-api.spec",

test/evals/components/EvalDashboard/audience.ts

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,19 +13,16 @@ const CATEGORY_AUDIENCES: Record<string, Audience[]> = {
1313
admin: ['admins', 'users'],
1414
'building-plugins': ['maintainers', 'users'],
1515
collections: ['users'],
16-
commits: ['maintainers'],
1716
config: ['users'],
1817
conventions: ['maintainers'],
1918
fields: ['users'],
2019
graphql: ['users'],
2120
hooks: ['users'],
2221
'local-api': ['users'],
23-
negative: ['maintainers'],
22+
negative: ['users'],
2423
'official-plugins': ['users'],
2524
plugins: ['users'],
2625
'rest-api': ['users'],
27-
structure: ['maintainers'],
28-
testing: ['maintainers'],
2926
translations: ['maintainers'],
3027
}
3128

test/evals/datasets/conventions/qa.ts

Lines changed: 0 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -3,61 +3,10 @@ import type { EvalCase } from '../../types.js'
33
export type { EvalCase }
44

55
export const conventionsQADataset: EvalCase[] = [
6-
{
7-
input:
8-
'In Payload, should you prefer types or interfaces when defining TypeScript data shapes?',
9-
expected: 'types should be preferred over interfaces, except when extending external types',
10-
category: 'coding',
11-
},
12-
{
13-
input: 'What naming convention should be used for boolean variables in Payload code?',
14-
expected:
15-
'booleans should be prefixed with is, has, can, or should — for example isValid, hasData, canEdit, shouldRun',
16-
category: 'coding',
17-
},
18-
{
19-
input: 'Should Payload code prefer functions or classes?',
20-
expected: 'functions are preferred over classes; classes are only used for errors and adapters',
21-
category: 'coding',
22-
},
236
{
247
input: 'When passing an error to payload.logger.error, what is the correct format?',
258
expected:
269
'use an object with msg and err keys, like payload.logger.error({ msg: "message", err: error }); do not pass the error as a second argument',
2710
category: 'coding',
2811
},
29-
{
30-
input: 'Where do translation files live in the Payload monorepo?',
31-
expected: 'packages/translations/src/languages/',
32-
category: 'structure',
33-
},
34-
{
35-
input: 'What is the pattern for cleaning up database records created during a Payload test?',
36-
expected:
37-
'tests must delete any records they create; use afterEach with a shared array of created IDs to centralize cleanup, then clear the array',
38-
category: 'testing',
39-
},
40-
{
41-
input: 'What format should the first commit on a new Payload branch follow? Give an example.',
42-
expected:
43-
'conventional commits format: <type>(<scope>): <lowercase title> — for example feat(db-mongodb): add support for transactions or fix(ui): json field type ignoring editorOptions',
44-
category: 'commits',
45-
},
46-
{
47-
input: 'How do you start the Payload dev server using a specific test config directory?',
48-
expected:
49-
'run pnpm run dev <directory_name>, for example pnpm run dev fields loads test/fields/config.ts',
50-
category: 'development',
51-
},
52-
{
53-
input: 'What are the default auto-login credentials when running the Payload dev server?',
54-
expected: 'email dev@payloadcms.com and password test',
55-
category: 'development',
56-
},
57-
{
58-
input:
59-
'In Payload functions, should parameters be passed as individual arguments or as a single object?',
60-
expected: 'prefer single object parameters to improve backwards-compatibility',
61-
category: 'coding',
62-
},
6312
]

test/evals/datasets/plugins/official/qa.ts

Lines changed: 0 additions & 71 deletions
This file was deleted.
Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
import { describe } from 'vitest'
22

33
import { pluginsOfficialCodegenDataset } from './datasets/plugins/official/codegen.js'
4-
import { pluginsOfficialQADataset } from './datasets/plugins/official/qa.js'
5-
import { registerCodegenCases, registerQACases } from './suites/helpers.js'
4+
import { registerCodegenCases } from './suites/helpers.js'
65
import { resolveVariantOptions } from './variantOptions.js'
76

87
const options = resolveVariantOptions()
98
const { labelSuffix = '' } = options
109

1110
describe(`Official Plugins${labelSuffix}`, () => {
12-
registerQACases(pluginsOfficialQADataset, 'Official Plugins: QA', options)
1311
registerCodegenCases(pluginsOfficialCodegenDataset, 'Official Plugins: Codegen', options)
1412
})

0 commit comments

Comments
 (0)