Add regression tests for current output by DanielEScherzer · Pull Request #99 · rust-lang/thanks

DanielEScherzer · 2026-04-19T06:13:54Z

No description provided.

DanielEScherzer · 2026-04-19T06:42:27Z

Apparently Youngsuk_Kim was supposed to be in all caps as YOUNGSUK_KIM in 1.52.0 - even though it wasn't when I reran cargo run --release locally...

Kobzol

Thanks! Having tests and an easy way to bless them seems generally helpful.

xtqqczze · 2026-04-29T17:13:29Z

Is such an extensive regression test necessary, considering the changes in this patch are ~6 MiB?

Kobzol · 2026-04-29T18:14:28Z

Well, testing the generated HTML is likely the simplest way to check the output of thanks at the moment. The files are very similar and git will likely compress the delta down to a few hundreds KiBs at most, if not less, so I wouldn't worry about that much.

Kobzol

Thanks. I think that this is useful, but I also want to hear from @Mark-Simulacrum.

DanielEScherzer · 2026-04-29T18:44:41Z

Hmm, one disadvantage of the tests is that when the rust-lang/rust mailmap changes, they might start to fail
E.g. I'm pretty sure that the current failure was caused by rust-lang/rust@80777dd, though there aren't any warnings from parsing the mailmap? Asked on the r-l/r PR

I'll wait to hear from Mark-Simulacrum before updating the expected content

We could also add some logic to skip the test on the scheduled website updates and only run it on push and pull_request

xtqqczze · 2026-04-29T18:44:59Z

The repo is currently 3.9MB on disk, so it’d be great if this didn’t increase it to 10MB just for a single test fixture. Perhaps tests/expected could be stored as a compressed archive in the repo and unpacked during the test instead?

Kobzol · 2026-04-29T18:57:11Z

The repo is currently 3.9MB on disk, so it’d be great if this didn’t increase it to 10MB just for a single test fixture. Perhaps tests/expected could be stored as a compressed archive in the repo and unpacked during the test instead?

That sounds like unnecessary complexity to me. The size of the test suite on disk won't really increase dramatically in the future, and 10 MiB seems fine. If I do cargo build, the target directory of this project on Linux has almost 500 MiB, so :)

Hmm, one disadvantage of the tests is that when the rust-lang/rust mailmap changes, they might start to fail
E.g. I'm pretty sure that the current failure was caused by rust-lang/rust@80777dd, though there aren't any warnings from parsing the mailmap? Asked on the r-l/r PR

Hmm, yeah, that's a good point. It's almost as if we should have a fixed mailmap.. Or just a test repo, but then that won't exercise all the quirks that occur in rust-lang/rust.

DanielEScherzer · 2026-04-29T19:00:01Z

Hmm, one disadvantage of the tests is that when the rust-lang/rust mailmap changes, they might start to fail
E.g. I'm pretty sure that the current failure was caused by rust-lang/rust@80777dd, though there aren't any warnings from parsing the mailmap? Asked on the r-l/r PR

Hmm, yeah, that's a good point. It's almost as if we should have a fixed mailmap.. Or just a test repo, but then that won't exercise all the quirks that occur in rust-lang/rust.

We could run the tests with a fixed mailmap:

[test] manually clone r-l/r to the expected place, replace the mailmap with the known/fixed one, then run cargo to generate output, compare
[build] just run cargo normally

DanielEScherzer · 2026-05-04T03:11:31Z

I don't have the rights to rerun failed jobs, so I rebased, the failure should be resolved by now (see #103)

Mark-Simulacrum · 2026-05-09T16:10:54Z

I think mailmap updates breaking this seems like a real concern. Maybe we can leave the contents here with either fixed mailmap or no mailmap at all? Though I believe https://github.com/rust-lang/thanks/blob/master/src/reviewers.rs changes will also require updates.

re:repository size, one of the concerns I'd have is that overtime each change will add a new copy of all the output to the git history, which means we'd grow potentially rapidly. One option could be that we just store hashes of the output: if you break it, it should be possible to generate the old and the new output locally and do a proper diff. But that would prevent meaningful growth of the actual repository (hashes are tiny, after all). If we did that then we could probably avoid fixed mailmaps too? There's not that much churn on the mailmap in practice I think.

DanielEScherzer · 2026-05-09T16:21:36Z

I think mailmap updates breaking this seems like a real concern. Maybe we can leave the contents here with either fixed mailmap or no mailmap at all? Though I believe https://github.com/rust-lang/thanks/blob/master/src/reviewers.rs changes will also require updates.

I think having no mailmap at all would mean that we lose the testing of the merging of mailmap entries, which we might not want to lose

But having a fixed mailmap shouldn't be an issue - since the mailmap is fetched from the current HEAD commit, we would just have the tests check out a specific older revision of the r-l/r repo to run the tests on

Though I believe https://github.com/rust-lang/thanks/blob/master/src/reviewers.rs changes will also require updates.

Those changes would be to this repo and so my thinking was that it wouldn't be problematic to update the tests at the same time

re:repository size, one of the concerns I'd have is that overtime each change will add a new copy of all the output to the git history, which means we'd grow potentially rapidly.

We don't actually need to add each new release (or even all of the historical ones, I just did that since they were all available)

One option could be that we just store hashes of the output: if you break it, it should be possible to generate the old and the new output locally and do a proper diff. But that would prevent meaningful growth of the actual repository (hashes are tiny, after all). If we did that then we could probably avoid fixed mailmaps too? There's not that much churn on the mailmap in practice I think.

That would work, but it means that the process of reviewing changes that change the output would be a bit harder. What if instead the list of contributors was re-formatted in the tests to remove a bunch of the HTML? E.g. instead of (looking at 0.1.0.html)

            <tbody>
                <tr>
                    <td class="bn">1</td>
                    <td class="bn">Brian Anderson</td>
                    <td class="bn">1958</td>
                </tr>
                <tr>
                    <td class="bn">2</td>
                    <td class="bn">Graydon Hoare</td>
                    <td class="bn">1526</td>
                </tr>
...

we had

            <tbody>
1 |  Brian Anderson | 1958
2 | Graydon Hoare | 1526
...

it wouldn't be correct HTML, but it would make output changes easy to review (perhaps even easier than they are with the HTML) and reduce the size needed to store historical releases (saving ~155 characters per entry in the table)

Mark-Simulacrum · 2026-05-09T16:25:53Z

it wouldn't be correct HTML, but it would make output changes easy to review (perhaps even easier than they are with the HTML) and reduce the size needed to store historical releases (saving ~155 characters per entry in the table)

I believe git compresses within the .git storage, so probably those size wins would be fairly minor. I think producing a reasonable diff could be done by comparing against the published artifacts for the most part? Those are presumably 1:1 with the local generated HTML?

I don't have a strong opinion here either way though.

DanielEScherzer · 2026-05-09T20:11:11Z

it wouldn't be correct HTML, but it would make output changes easy to review (perhaps even easier than they are with the HTML) and reduce the size needed to store historical releases (saving ~155 characters per entry in the table)

I believe git compresses within the .git storage, so probably those size wins would be fairly minor. I think producing a reasonable diff could be done by comparing against the published artifacts for the most part? Those are presumably 1:1 with the local generated HTML?

I don't have a strong opinion here either way though.

We could check against the published artifacts, but that requires a network connection and fetching them individually

I've updated this to do the testing against a pinned version of the mailmap (see #104 to confirm that the mailmap is actually read from the pinned version rather than the current latest - by pinning to an older version, the tests start to fail)

Before applying that cleanup, du -sh tests/expected/ returned 6.2M, afterwards 1.4M, so I think the simplifications help and I've applied them. We could further simplify things by replacing the <nav class="flex flex-row justify-center justify-end-l items-center flex-wrap ph2 pl3-ns pr4-ns"> and parts of the <head> tag, but I think this is now small enough

The point of the regression tests is to confirm that the logic to count contributors and their commits does not break, we don't need to store full copies of the HTML to check that. Instead, simplify the contributor entries in the HTML before comparing with the expected output, reducing the disk space needed to store the expected output by just over 77%.

Kobzol · 2026-06-15T12:45:52Z

Sorry for the delay, I didn't have much time for reviews in the past few months. Diffing against the published artifacts sounds like it might work, though once it changes you can't easily go back in history and diff against older versions.

The main thing that we want to test is:

Names of the contributors
Their contribution counts

For specific historical releases (doesn't even have to be all of them). We don't really need to check any HTML output, that's not so interesting for us to have a test for, I think. Is there a simple way to hook into the current rendering logic to produce e.g. a MD file or a JSON file, as you suggested, and store only that to git, and diff that?

DanielEScherzer · 2026-06-15T18:35:55Z

Sorry for the delay, I didn't have much time for reviews in the past few months. Diffing against the published artifacts sounds like it might work, though once it changes you can't easily go back in history and diff against older versions.

The main thing that we want to test is:

Names of the contributors

Their contribution counts

For specific historical releases (doesn't even have to be all of them). We don't really need to check any HTML output, that's not so interesting for us to have a test for, I think. Is there a simple way to hook into the current rendering logic to produce e.g. a MD file or a JSON file, as you suggested, and store only that to git, and diff that?

My worry is that the more logic we add to make the tests simpler, the further they stray from the actual output
The easiest option would be to just replace templates/stats.hbs with something like (so that we still test the release title, in progress, counting, etc.):

{{ release_title }} Contributors

{{#if in_progress }}
We have had {{ count }} individuals contribute to {{ release }} so far.
{{else}}
We have had {{ count }} individuals contribute to {{ release }}.
{{/if}}

Rank | Name | Contributions
{{#each scores as |score| }}
{{score.rank}} | {{score.author}} | {{score.commits}}
{{/each}}

and that way we don't have all of the other parts of the HTML, and don't need to modify the production code. Though personally I would prefer to do the modifications in the test assert_file_content_matches() rather than by switching out the template

As for reducing the number of releases to test against, #37 (comment) reported that there were only unexpected changes in 1 release (1.34.0) so if we don't test every release we run the risk of an untested release page having unexpected changes

DanielEScherzer · 2026-06-15T18:50:26Z

Before additional simplification:

root@ea7197ba2d80:/rust/thanks# du -sh tests/expected/
1.4M    tests/expected/

after removing the sections: 1.3M
after then removing the

sections: 1.2M
removing the remaining HTML bits: 1.1M

So we can drop the other stuff, giving us expected files like

Rust 1.93.1 Contributors
We have had 7 individuals contribute to 1.93.1.
Thank you so much!

1 | Josh Stone | 3
2 | Jieyou Xu | 2
3 | Alex Crichton | 1
3 | Guillaume Gomez | 1
3 | Jonathan Brouwer | 1
3 | Linshu Yang | 1
3 | bors | 1

but it wouldn't save much space on disk. Let me know if I should apply those simplifications

DanielEScherzer force-pushed the add-regression-tests branch from e8cbf0a to c66db0f Compare April 19, 2026 06:41

Kobzol reviewed Apr 29, 2026

View reviewed changes

Comment thread tests/regression-output-test.rs Outdated

Kobzol reviewed Apr 29, 2026

View reviewed changes

Add regression tests for current output

93c5d07

DanielEScherzer force-pushed the add-regression-tests branch from 2e9062c to 6623aba Compare May 4, 2026 03:11

regression-output-test.rs: expect() files rather than using a canary

fae7c9f

DanielEScherzer force-pushed the add-regression-tests branch from 6623aba to fae7c9f Compare May 5, 2026 22:18

Run tests with pinned mailmap

64e048f

DanielEScherzer mentioned this pull request May 9, 2026

Regression test failure #104

Draft

DanielEScherzer force-pushed the add-regression-tests branch from 2757875 to c03a90f Compare May 9, 2026 20:24

DanielEScherzer force-pushed the add-regression-tests branch from c03a90f to 89e4a99 Compare May 9, 2026 20:40

DanielEScherzer requested a review from Kobzol May 12, 2026 19:37

Conversation

DanielEScherzer commented Apr 19, 2026

Uh oh!

DanielEScherzer commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kobzol left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xtqqczze commented Apr 29, 2026

Uh oh!

Kobzol commented Apr 29, 2026

Uh oh!

Kobzol left a comment

Choose a reason for hiding this comment

Uh oh!

DanielEScherzer commented Apr 29, 2026

Uh oh!

xtqqczze commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kobzol commented Apr 29, 2026

Uh oh!

DanielEScherzer commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DanielEScherzer commented May 4, 2026

Uh oh!

Mark-Simulacrum commented May 9, 2026

Uh oh!

DanielEScherzer commented May 9, 2026

Uh oh!

Mark-Simulacrum commented May 9, 2026

Uh oh!

DanielEScherzer commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kobzol commented Jun 15, 2026

Uh oh!

DanielEScherzer commented Jun 15, 2026

Uh oh!

DanielEScherzer commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DanielEScherzer commented Apr 19, 2026 •

edited

Loading

xtqqczze commented Apr 29, 2026 •

edited

Loading

DanielEScherzer commented Apr 29, 2026 •

edited

Loading

DanielEScherzer commented May 9, 2026 •

edited

Loading