Export srs_diff_est() by vinniott · Pull Request #340 · stan-dev/loo

vinniott · 2026-03-22T11:27:41Z

Fixes #333

Note:

There is no @example yet because I did not have time yet to get fully familiar with the whole package.
I updated NEWS.md as suggested in CONTRIBUTING.md but I am not sure whether I did that correctly.

codecov-commenter · 2026-03-22T14:46:37Z

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.80%. Comparing base (7eafeb8) to head (6863c50).
⚠️ Report is 47 commits behind head on master.

Files with missing lines	Patch %	Lines
R/print.R	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #340      +/-   ##
==========================================
+ Coverage   92.78%   92.80%   +0.02%     
==========================================
  Files          31       31              
  Lines        2992     3004      +12     
==========================================
+ Hits         2776     2788      +12     
  Misses        216      216

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jgabry · 2026-03-23T17:16:37Z

Thank you @vinniott.

There is no @example yet because I did not have time yet to get fully familiar with the whole package.

@avehtari or @MansMeg, is there any specific example you'd like to use for this in the documentation?

avehtari · 2026-03-23T18:13:27Z

The example should be based on flexible enough model so that elpd(log_lik_matrix) and loo(log_lik_matrix) differ more than by 1. The current example_loglik_matrix() has too few observations. The example used in tests for subsampling is one parameter model. Should we have a real model, or store another example loglik matrix?

After we have useful loglik matrix, the example code would be something like

# Use posterior predictive density as the fast but biased method for all observations
lpd <- elpd(log_lik_matrix)
sum(lpd$pointwise[,"elpd"])

# Use PSIS-LOO for subsample of 50 randomly selected observations
idx <- sample(1:N, 50)
elpd_loo_sub <- loo(log_lik_matrix[,idx])
20 * sum(elpd_loo_sub$pointwise[,"elpd_loo"])

# Use difference estimator to combine fast result and subsampled accurate result
loo:::srs_diff_est(lpd$pointwise[,"elpd"], elpd_loo_sub$pointwise[,"elpd_loo"], idx)

# Comparison to using PSIS-LOO for all observations
loo(log_lik_matrix)

This matches what someone was asking

jgabry · 2026-03-23T20:50:06Z

Should we have a real model, or store another example loglik matrix?

Either is fine by me. Also if we're only using it for this example, we could also just generate an example loglik matrix in the example code instead of storing it.

avehtari · 2026-03-24T18:21:21Z

I think the interesting examples can be slow to run. I'll test subsampling with few interesting real models this week

avehtari · 2026-03-27T10:38:50Z

Thus would be a good example with data from https://archive.ics.uci.edu/ml/datasets/wine+quality

library(dplyr)
library(brms)
options(brms.backend = "cmdstanr")
options(mc.cores = 4)
library(loo)

wine <- read.delim(root("winequality-red", "winequality-red.csv"), sep = ";") |>
  distinct()

wine_scaled <- as.data.frame(scale(wine))

fitos <- brm(ordered(quality) ~ .,
            family = cumulative("logit"),
            prior = prior(R2D2(mean_R2 = 1/3, prec_R2 = 3)),
            data = wine_scaled,
            seed = 1,
            silent = 2,
            refresh = 0)

log_lik_matrix <- log_lik(fitos)

N <- nrow(wine_scaled)
Nsub <- 100

# posterior log-score
lpd <- elpd(log_lik_matrix)
sum(lpd$pointwise[,"elpd"])

# Use PSIS-LOO for subsample of Nsub randomly selected observations
set.seed(1)
idx <- sample(1:N, Nsub)
elpd_loo_sub <- loo(log_lik_matrix[,idx])
sum(elpd_loo_sub$pointwise[,"elpd_loo"]) / Nsub * N

# Use difference estimator to combine fast result and subsampled accurate result
loo:::srs_diff_est(lpd$pointwise[,"elpd"], elpd_loo_sub$pointwise[,"elpd_loo"], idx)

# Comparison to using PSIS-LOO for all observations
loo(log_lik_matrix)

p_loo is here about 17 and thus posterior log-score is clearly different
N is 1359, so that a subsample of 100 is still only small part of all observations
No high Pareto-k values to complicate things
Subsampling with Nsub gets close to the full result

As compiling and sampling the brms model takes some time, I would store only the log_lik_matrix but show the code for how it is generated. The rest of code is fast

vinniott · 2026-04-01T19:07:58Z

Interesting, do I understand it correctly that the log_lik_matrix.rda would be stored in loo/data ?
I will try to continue trying to implement this example towards the end of next week.

jgabry · 2026-04-01T20:14:13Z

We've been using the data directory only for datasets used in vignettes. The way we currently do example log lik matrices is like this:

https://github.com/stan-dev/loo/blob/master/data-raw/generate-example_loglik_array.R
https://github.com/stan-dev/loo/blob/master/R/example_log_lik_array.R

avehtari · 2026-04-26T09:49:26Z

Instead of generating log_lik on the fly, we could use the stored wine log_lik which was added in #352. This would make the example to run much faster. Need to check whether wine log_lik was added only for touchstone and was it too big for CRAN, @jgabry , @VisruthSK

github-actions · 2026-04-26T11:02:30Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 30cc34e is merged into master:

✔️loo_function: 1.77s -> 1.77s [-0.1%, +0.52%]
✔️loo_matrix: 1.32s -> 1.32s [-0.42%, +0.26%]
Further explanation regarding interpretation and methodology can be found in the documentation.

vinniott · 2026-04-26T12:44:41Z

The tests are failing so I will give a short summary of what I did.
First, I fetched all upstream updates and added them to this export-srs-diff-est feature branch of the PR.

Second, I renamed generate-example_loglik-array.R to generate-example_loglik-objects.R and also
example_log_lik_array.R to example_log_lik_objects.R.
Here, I added the .example_wine_loglik_matrix to sysdata.rda as you suggested above @jgabry

Third, I added the example by @avehtari to srs_diff_est().

Fourth, I did devtools::document(), devtools::install(), and when I then ran the example of ?srs_diff_est()
it worked on my machine (e.g., screenshot from my RStudio):

So I hope that I shouldnt be too far from getting the tests to pass. Do you have an idea why they might fail?

Instead of generating log_lik on the fly, we could use the stored wine log_lik which was added in #352.

Of course also an option! I just wanted give the sysdata.rda implementation a try as I was close to finishing it when you pointed this out. Though, the tests currently do not pass, of course.

VisruthSK · 2026-04-27T15:10:43Z

Instead of generating log_lik on the fly, we could use the stored wine log_lik which was added in #352. This would make the example to run much faster. Need to check whether wine log_lik was added only for touchstone and was it too big for CRAN, @jgabry , @VisruthSK

Yes to both. Clocks in at about 40MB, and is currently only in the touchstone directory.

VisruthSK · 2026-04-27T15:27:07Z

Hi @vinniott! Thanks for working on this.

So I hope that I shouldnt be too far from getting the tests to pass. Do you have an idea why they might fail?

I think we should figure out how we're storing/using the wine data before we fix the tests.

Apologize if you already know this, but if you click on the failed runs, you'll be able to scroll down till you see R CMD check output. Locally, you can run devtools::check() which will take a long time, but will approximate the same tests that are being run here. Here's a link to the main issue in this PR right now--I don't think the wine data is available for the package, so the example is failing when trying to find it. You can see that again further down, L216, when R CMD check is complaining about not finding a binding for .example_wine_loglik_matrix.

Fourth, I did devtools::document(), devtools::install(), and when I then ran the example of ?srs_diff_est() it worked on my machine

I think the problem here might be that you ran some code locally before that to make .example_wine_loglik_matrix populate in your local environment. There's a way to restart R sessions in RStudio I think, and in the future if you run into a bug like this where things are okay locally but failing for someone else/on a runner, it might help if you try running the tests in a fresh R session.

Hope that slightly clears up why the tests are failing, and how to (hopefully) reproduce those failures locally.

vinniott · 2026-05-03T09:30:41Z

HI @VisruthSK,
thank you so much for the elaborate help, really!
Yes, let's first clarify, how to store the wine loglik matrix.

If it's currently too big for CRAN as is, does this mean that neither touchstone nor sysdata.rda (as suggested above by @jgabry) are possible and that an alternative (data set or data storage method) is needed?

avehtari · 2026-05-05T17:22:27Z

There seems to be some mistake in this PR as it shows in "Files changed" and in commits many things that should not be part of this PR which makes reviewing it harder
I think we should just show the code for how to use srs_diff_est in the doc, but not actually run the code and then we don't need to worry about how much generating log_lik takes or how much space it takes

@avehtari

as proposed by @avehtari in issue stan-dev#333

vinniott · 2026-05-10T13:58:45Z

Okay. There are now a lot of confusing commits. Here is what I did:

As suggested by @avehtari I have now simply added a commented out example.
All this happend in a clean, new feature branch with cherry-picked previous commits (master...vinniott:loo:clean-export-srs-diff-est).
I then git reset --hard clean-export-srs-diff-est onto this existing branch called export-srs-diff-est
because ChatGPT I recommended this over creating a new PR with the new clean branch.

I have two ettiquette questions:

Do you generally recommend the git reset procedure over a new PR? Even in this case?
What do I best do if I work on a feature/PR branch and then the upstream master updates? Should I merge those commits into my own master as well as my own feature branches? I am asking because this caused me a lot of headache here (though I learned a lot!). Maybe I did it in a wrong way?

VisruthSK · 2026-05-11T15:44:50Z

@vinniott Thanks for making that new, clean branch! I just moved this PR to use that branch, sorry for not asking or anything but I hope you don't mind.

Do you generally recommend the git reset procedure over a new PR? Even in this case?

Can't speak in general, but personally I didn't want to lose any discussion from this PR. Making a new, clean PR and linking to this one is also okay I guess, but since cleaning this PR up wasn't too hard (with the help of a LLM), I think updating a PR is cleaner.

What do I best do if I work on a feature/PR branch and then the upstream master updates? Should I merge those commits into my own master as well as my own feature branches?

This is one of the reasons I don't like working on forks :) I think usually you should reabse feature branches to origin master, and keep fork's master clean and up to date with origin. You can do a merge commit too, but rebasing is cleaner IMO.

I glanced through this forum post, and it seems good. I would also suggest using the GitHub UI if possible, then the gh CLI, then plain old git. GitHub Desktop might have some features for this too.

Hope this helps!

VisruthSK · 2026-05-11T16:23:41Z

Separately, I think the example should be in a dontrun block instead of commented out.

vinniott mentioned this pull request Mar 22, 2026

export srs_diff_est #333

Open

avehtari mentioned this pull request Apr 9, 2026

Rely on posterior for pareto smooth tails #290

Draft

vinniott added 13 commits May 10, 2026 15:10

set up documentation structure

a2db120

srs_diff_est.Rd matches .R documentation

5fe5c34

added documentation

6e8862d

as proposed by @avehtari in issue stan-dev#333

added @Seealso at loo_subsample()

bf6b2b3

added reference Cochran (1977)

6f485ac

removed oudated @return duplicate

67275cf

corrected .R formulas to render in .Rd

caa9ef6

removed example placeholder

ddf1d87

updated .Rd to match .R

9699240

Update NEWS.md

c0a5447

added @examples placeholder

94fb757

added generation of wine example

535f464

added full example

30cc34e

VisruthSK force-pushed the export-srs-diff-est branch from 6863c50 to 30cc34e Compare May 11, 2026 15:27

Uh oh!

Conversation

vinniott commented Mar 22, 2026

Uh oh!

codecov-commenter commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jgabry commented Mar 23, 2026

Uh oh!

avehtari commented Mar 23, 2026

Uh oh!

jgabry commented Mar 23, 2026

Uh oh!

avehtari commented Mar 24, 2026

Uh oh!

avehtari commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinniott commented Apr 1, 2026

Uh oh!

jgabry commented Apr 1, 2026

Uh oh!

avehtari commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinniott commented Apr 26, 2026

Uh oh!

VisruthSK commented Apr 27, 2026

Uh oh!

VisruthSK commented Apr 27, 2026

Uh oh!

vinniott commented May 3, 2026

Uh oh!

avehtari commented May 5, 2026

Uh oh!

vinniott commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VisruthSK commented May 11, 2026

Uh oh!

VisruthSK commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Mar 22, 2026 •

edited

Loading

avehtari commented Mar 27, 2026 •

edited

Loading

github-actions Bot commented Apr 26, 2026 •

edited

Loading

vinniott commented May 10, 2026 •

edited

Loading